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INTRODUCTORY 

The  Problem 

Because  of  the  tangible  and  quantitative  nature  of  its  product, 
the  typing  process  is  well  suited  to  correlational  investigation. 
The  aims  of  this  study  are  to  determine  the  interrelations  of  the 
objective  and  measurable  elements  of  a  group  of  typed  papers  with 
reference  to  their  value  as  criterion  measures  of  proficiency,  and  to 
ascertain  the  extent  to  which  psychological  test  scores  and  similar 
variables  are  correlated  with  these  purely  objective  measures. 
Previously  reported  experiments,  in  which  the  purpose  has  usually 
been  to  discover  predictive  tests,  indicate  a  tendency  toward  positive 
correlation  with  test  scores,  but  in  the  present  study  the  effort  has 
been  not  so  much  to  establish  the  fact  of  a  positive  correlation  as  to 
ascertain  some  of  the  specific  factors  giving  rise  to  this  correlation. 


CHAPTER  I 
A  Review  of  Previous  Investigations 

The  earliest  experimentation  in  typing  was  in  the  study  of  the 
learning  process  in  general,  and  typed  papers  were  regarded  only  as 
material  for  this  purpose  rather  than  as  a  measure  of  a  specific  ability 
to  be  investigated  for  its  own  interest.  The  most  extensive  of  such 
studies  was  that  of  Book  (1908)  which  still  remains  a  classic  in  the 
experimental  psychology  of  learning. 

The  first  predictive  experiment  in  typing,  which  is  also  interesting 
historically  as  the  earhest  reported  "prognostic"  experiment  of  any 
kind  in  vocational  guidance,  is  that  of  Lough  (1912)  upon  high 
school  boys,  presumably  six  in  number.  The  criterion  measures  of 
typing  were  evidently  ratings  by  an  independent  observer,  probably 
teachers'  marks.  One  test  was  given,  a  letter  for  letter  substitution 
test,  the  scores  consisting  of  the  author's  ratings  on  the  quality  of  the 
practice  curves  obtained  from  repeated  appHcations  of  the  test.  The 
results  were  presented  graphically  in  what  would  correspond  techni- 
cally to  a  simple  regression  line,  and  this  obtained  hne  was  compared 
to  a  line  representing  perfect  correlation.  The  exact  data  of  Lough's 
report,  the  purpose  of  which  was  to  present  the  possibihties  of 
vocational  guidance  by  psychological  test  methods,  are  not  easy  to 
ascertain.  Upon  the  interpretation  which  is  made  here,  Muscio 
(1921,  p.  15)  calculates  a  correlation  coefficient  of  .92zh.04  between 
"habit  formation"  and  typing  proficiency. 

Lahy  (1913)  upon  N  =  11  subjects,  consisting  of  six  women  of  two 
to  four  years'  experience  and  five  men  of  two  to  eight  years'  experi- 
ence, employed  as  criterion  measures  the  ratings  made  by  a  director 
of  a  typewriting  office  and  by  several  expert  typists  upon  the  speed, 
number  of  errors,  and  "arrangement"  of  the  results  of  a  copying 
test,  the  time  required  to  copy  the  test  ranging  from  3'  25"  to  11'  20" 
among  the  several  subjects.  On  the  basis  of  this  test  the  typists 
were  separated  into  good  and  mediocre  groups,  and  the  performances 
of  the  two  groups  in  a  series  of  psychological  and  physiological  tests 
were  compared.  Muscio  (1921,  p.  14)  has  recalculated  Lahy's  data, 
and  reports  these  correlation  coefiicients:  "Immediate  memory  for 
concrete  phrases,  .60,  immediate  memory  for  digits,  .27,  muscular 
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symmetry  (tendency  of  the  two  hands  to  be  identical  in  strength) 
.68,  tactile  sensibility,  .77,  muscular  sensibihty,  .58,  sustained 
attention  (judged  by  errors  in  the  cancellation  test),  .18,  sustained 
attention  (judged  by  time  taken  in  the  cancellation  test),  .14,  and 
auditory  reaction-time,  —.34.  .  .  .  Some  tests  for  higher  mental 
functions  (abstraction  and  reasoning)  were  also  given,  but  they 
yielded  no  positive  results  and  it  is  not  said  exactly  what  they  were." 
The  earliest  reported  investigations  employing  the  method  of  co- 
efficiential  correlation  were  made  by  Rogers  (1917  and  1922)  upon 
two  types  of  subjects.  Upon  students  in  typing  classes  in  Columbia 
University  who  were  finishing  their  first  year  of  training,  he  employed 
as  criterion  measures  the  results  of  the  monthly  typewriting  tests 
given  in  dictation  by  the  instructor,  consisting  of  the  amount  of 
material  copied  in  ten  minutes  minus  a  penalty  of  five  words  per 
error.  Upon  two  groups  of  typists  in  the  employ  of  a  large  retail 
commercial  concern  in  Brooklyn,  New  York,  of  whom  N  =  38  girls 
had  been  working  in  the  same  division  for  at  least  ten  months,  and 
N  =  65  girls  had  had  between  six  weeks  and  six  months  of  practical 
experience,  he  employed  as  criterion  the  average  number  of  perfect 
typed  sheets  produced  during  certain  carefully  selected  periods. 
The  work  was  a  very  speciahzed  one,  consisting  in  making  out 
separate  departmental  order  forms  for  the  commodities  listed  in  the 
original  order  as  received  from  the  customer.  For  the  more  experi- 
enced group  the  papers  were  chosen  for  a  five-week  period  during 
which  the  group  collectively  made  its  best  record,  this  period  being 
selected  on  the  basis  of  a  study  of  the  output  during  ten  months. 
For  the  less  experienced  group  the  period  chosen  covered  the  two 
weeks  in  which  each  subject  made  her  best  record  individually. 
Upon  the  university  group,  N's  =  27  to  42,  the  following  product 
moment  r's  were  reported,  the  test  results  being  correlated  with 
each  of  six  monthly  routine  tests:  verb-object  ranged  from  .21  to  .57; 
number  group  checking  ranged  from  —.01  to  .53;  color  naming 
ranged  from  .29  to  .61;  action-agent  ranged  from  .00  to  .43;  agent- 
action  ranged  from  —.02  to  .40;  form  substitution  ranged  from  .11 
to  .42;  hard  directions  ranged  from  .11  to  .34;  mixed  relations  ranged 
from  —.09  to  .25;  and  opposites  from  .07  to  .54.  Upon  the  more 
experienced  employed  group,  N  =  38,  the  coefficients  for  these  tests 
were  .28,  .28,  .39,  .13,  .02,  .04,  -.20,  -  .07,  and  -.09,  respectively, 
and  upon  the  less  experienced  group,  N  =  65,  the  r's  were  .33,  .34, 
.36,  .29,  .18,  .17,  .16,  .03,  and  -.07,  respectively.  Upon  the  experi- 
enced groups  correlations  were  calculated  upon  two  other  criteria: 
(1)  the  average  output  of  the  week  in  which  the  group  made  the  best 
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score  collectively,  and  (2)  the  best  day's  output  of  each  typist 
individually.  The  r's  upon  these  criteria  showed  a  shght  superiority 
over  those  based  upon  the  longer  period. 

Upon  nine  groups  of  Des  Moines  high  school  students,  N's  =  13 
to  26,  Robinson  (1920,  unpublished)  employed  two  criterion  measures 
of  typing  proficiency:  (1)  the  results  of  the  routine  ten-minute 
Remington  speed  test,  each  error  being  penalized  ten  words,  and  (2) 
instructor's  order  of  merit  rankings.  Against  these  were  correlated 
the  results  of  two  "intelligence"  tests  and  two  "motor"  tests:  (a) 
Army  Alpha,  form  5,  (b)  the  Vasey  (1919)  vocabulary  test,  (c)  the 
Ream  (1919  and  1922)  tapping  test,  and  (d)  the  Hansen  (1920  and 
1922)  serial  action  test.  The  composite  ranking  in  the  four  psycho- 
logical tests  yielded  the  following  coefficients  obtained  by  Spearman's 
Footrule  formula  and  converted  into  r-values  by  the  usual  table: 
for  the  speed  test,  mean  value  of  eight  positive  coefficients  =  .282, 
the  range  being  from  an  unstated  negative  value  to  .500,  and  for  the 
instructor's  rankings,  mean  of  nine  coefiicients  =  .346,  range  from 
.192  to  .528.  With  instructor's  rankings,  coefficients  were  reported 
as  follows  for  the  separate  tests:  for  chronological  age,  mean  of 
eight  positive  coefiicients  =  .240,  range  from  an  unstated  negative 
value  to  .528;  Army  Alpha,  mean  of  eight  positive  coefficients  =  .237 
range  from  an  unstated  negative  value  to  .528;  vocabulary,  mean 
of  seven  positive  coefficients  =  .197,  range  from  two  unstated 
negative  values  to  .338;  tapping,  mean  of  five  positive  coefficients 
=  .144,  range  from  four  unstated  negative  values  to  .369;  serial  action 
(time),  mean  of  six  positive  coefficients  =  .185,  range  from  three 
unstated  negative  values  to  .414,  and  for  serial  action  (errors),  mean 
of  six  positive  coefficients  =.188,  range  from  three  unstated  negative 
values  to  .528. 

In  the  report  of  Poffenberger  (1922)  upon  N  =  62  students  in  a 
"high  grade  secretarial  school  in  New  York  City,"  the  instructors' 
ratings  of  proficiency  in  typing  were  correlated  against  scores  made 
in  Army  Alpha.  For  total  Alpha  the  correlation  (presumably  by 
Spearman's  rho-formula)  was  .46,  and  for  the  scores  in  the  eight 
elements  of  Alpha  the  coefficients  were  .59,  .78,  .45,  .41,  .46,  .60,  .62 
and  .11. 

Several  types  of  objective  criterion  measures  were  used  by 
Brewington  (1922  and  1923).  Upon  one  of  her  groups  composed 
of  twenty  beginning  students  of  typing  in  the  University  of  Chicago 
the  criteria  were  (1)  number  of  errors  and  the  time  required  to  achieve 
a  perfect  copy  of  a  stated  number  of  lessons,  (2)  number  of  strokes 
made  in  class  drills  with  deductions  for  errors,  and  (3)  the  results  of 
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the  monthly  fifteen-minute  Underwood  Awards  tests.  Upon  the 
other  group  composed  of  twenty-two  students  in  a  business  college 
in  Chicago  with  Httle  or  no  previous  training,  the  criterion  was  the 
number  of  hours  required  by  the  subject  to  make  a  perfect  copy  of  a 
given  assignment.  Two  types  of  tests  were  used,  five  from  the 
Wood  worth- Wells  (1911)  series:  action-agent,  verb-object,  digit 
cancellation,  form  substitution  and  color  naming,  all  except  the  last 
being  given  as  group  tests;  and  two  special  tests  requiring  the  use  of  a 
typewriter  as  an  instrument,  one  a  "serial  reaction"  test  in  which  the 
striking  of  the  key  corresponding  to  a  number  on  the  indicator  auto- 
matically brings  up  the  next  number  to  be  struck,  and  the  other  a 
"rhythm"  test  consisting  of  striking  the  keys  rhythmically  in 
accordance  with  a  schedule.  Upon  the  university  group,  N's  =  13 
to  20,  the  correlations  (rho's)  for  the  single  association  tests  with 
the  several  criteria  were  generally  positive,  but  seldom  as  large  as 
twice  their  probable  errors.  Upon  the  business  college  group, 
N's  =  13  to  22,  the  coefficients,  which  were  reported  only  for  the 
form  substitution  and  color  naming  tests  among  the  association 
tests,  were  higher,  usually  falling  between  twice  and  three  times 
their  probable  errors  for  four  samplings  of  class  work.  The  two 
instrument  tests  yielded  much  higher  correlations.  Upon  the 
university  group,  N's  =  13  to  18,  for  the  "serial  reaction"  test,  the 
eighteen  rho's  reported  ranged  from  .35  to  .75,  and  upon  the  business 
college  group,  N's  =  15  to  22,  the  twelve  rho's  ranged  from  .41  to 
.82.  The  "rhythm"  test  upon  the  university  group,  N's  =  13  to  17 
yielded  five  rho's  ranging  from  -.26  to  .16,  four  being  negative,  but 
upon  the  business  college  group,  N's  =  15  to  22,  four  rho's  ranged 
from  .47  to  .76. 

Chapman  (1919)  in  studying  the  learning  curves  in  typing  upon 
N  =  20  students  aged  16  to  18  years,  reported  that  the  three  inter- 
correlations  of  the  three  sampHngs,  20  to  25  hours  of  practice,  76  to 
91  hours,  and  136  to  151  hours,  ranged  from  .65  to  .66.  The  measures 
consisted  of  weekly  five-minute  copying  tests  upon  material  from 
Addison's  Essays  altered  in  order  to  secure  homogeneity. 

In  a  unique  study  Hoke  (1922,  pp.  20-23)  reported  that  one  factor 
influencing  the  accuracy  of  typed  papers  was  the  relative  frequency 
in  the  use  of  the  letters  and  characters  of  the  keyboard :  the  product 
moment  correlation  of  these  relative  frequencies  as  ascertained 
from  extensive  word  and  letter  counts  in  representative  material 
was  .924 ±.021  with  the  percentage  of  accuracy  of  these  letters  and 
characters  in  500  full-size  papers  of  practice  material  obtained  from 
approximately  100  individuals.    Absolute  position  on  the  keyboard 
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and  position  relative  to  other  letters  in  the  typed  material  appeared 
to  manifest  no  relation  with  accuracy. 

In  1923  appeared  a  series  of  five  carefully  composed  three-minute 
copying  tests  in  typing  by  Blackstone  (1922  and  1923)  based  upon 
strokes  as  units,  the  average  stroke-intensity  being  5.6  strokes  per 
word  to  conform  to  the  findings  of  a  study  of  the  ordinary  business 
letter  vocabulary.  Considerable  experimental  work  is  reported,  the 
conclusions  of  which  were  that  upon  the  stroke  basis,  the  results  of 
a  three-minute  test  were  as  "regular"  as  those  of  tests  of  one  minute, 
two  minutes,  etc.,  up  to  ten  minutes,  (1923,  Manual,  p.  2)  and  also 
that  "there  is  an  indication  that  ...  the  length  of  words,  within  rea- 
sonable Umits,  does  not  greatly  affect  the  rate  or  the  accuracy  (1922, 
p.  8)."  In  order  to  obtain  homogeneity  of  material  throughout  all 
five  tests,  each  was  given  approximately  the  same  total  number  of 
words  and  of  strokes,  the  same  number  of  words  of  one  letter,  two 
letters,  etc.,  the  same  distribution  of  long  or  difficult  words,  and 
approximately  the  same  number  of  strokes  for  each  hand.  Speed 
and  accuracy  were  combined  into  a  single  score  by  the  presumably 
arbitrary  formula 

Strokes  per  minute  X  10 


Score  = 


Errors  +  10 


Blackstone  also  prescribes  a  five-minute  practice  period  on  material 
other  than  the  test  before  beginning  the  test  proper.  He  reports 
that  "the  rehabiHty  coefficient  of  correlation  between  two  forms  of 
the  test  averages  .93  for  groups  of  pupils  who  have  had  20  months  of 
instruction." 

In  a  second  predictive  study  of  typing  Robinson  (1921,  unpub- 
lished) made  use  of  Blackstone's  three-minute  copying  test  form  A 
scored  however  in  terms  of  an  arbitrary  point  scale  which  its  author 
has  since  discarded  in  favor  of  the  formulaic  score  (v.s.).  Upon 
N  =  255  Detroit  high  school  students  he  reported  the  following 
product  moment  r's  for  the  nine  tests  employed:  Horn's  checking  of 
misspelled  words,  .35;  Burt's  number  finding  test,  .25;  Iowa  Com- 
prehension B  1:  4-1-21,  .23;  a  letter-digit  substitution  test,  .20;  a 
writing  speed  with  recovery  from  interruption  test,  .19;  a  "motility" 
test  consisting  of  writing  the  consecutive  integers  from  11  upwards 
as  rapidly  as  possible,  .19;  a  letter  cancellation  test,  .18;  underhning 
adjacent  number  pairs  whose  sum  is  10,  .10;  and  underhning  adja- 
cent letters  forming  EngHsh  words,  .07. 

In  a  generally  similar  experiment  Ohmann  (1924,  unpubhshed) 
employed  as  a  criterion  Blackstone's  three-minute  copying  test 
form  D.    Upon  N  =  225  Detroit  high  school  students  he  obtained 
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the  following  correlations:  spelling,  from  the  Iowa  SpelUng  List,  .36; 
quahty  of  handwriting  as  measured  by  the  Ayres  "Gettysburg" 
scale,  .23;  a  language  test  consisting  of  checking  errors  in  grammar, 
punctuation  and  spelling  in  typical  business  letter  material,  .22; 
Woodworth- Wells  hard  directions,  .07;  Vasey's  (1919)  vocabulary 
test,  .06;  a  "motiUty"  test,  i.e.  the  speed  of  writing  groups  of  three 
short  "I's"  upon  paper  ruled  into  centimeter  squares,  .04;  Pyle's 
symbol-digit  test,  .04;  intelHgence  quotient  determined  from  the 
Otis  Self-Administering  Test  of  Mental  Abihty,  Higher  Examination, 
form  A,  —.02;  and  memory  span  for  groups  of  consonants,  —.14. 
Upon  the  instructor's  ratings  by  the  use  of  a  rating  scale  for  seven 
traits  (appearance,  thoroughness,  courtesy,  initiative,  cooperative- 
ness,  self-control,  and  hkability)  the  coefficient  was  .22  upon  N  =  133 
cases.  A  stenography  test  also  was  given,  consisting  of  transcription 
for  ten  minutes  from  the  students'  notes  made  upon  Blackstone's 
typing  copying  test  form  E,  succeeding  portions  of  which  were  read 
at  four  different  rates  of  speed  in  increasing  order.  The  correlation 
of  these  measures  with  the  typing  criterion  was  .31,  N  =  225. 

Another  study  employing  tests  similar  to  those  of  Robinson  and 
Ohmann  was  that  of  Tuttle  (1923)  whose  criterion  measure  was 
stated  as  a  "typewriting  test"  with  no  further  details.  Upon  a  group 
of  N  =  20  students  beginning  the  study  of  typewriting  he  obtained 
the  following  coefficients  for  these  eight  tests:  "attention  and 
accuracy,"  Part  I  (underlining  adjacent  numbers  whose  sum  is  9), 
.41  and  Part  II  (underhning  adjacent  combinations  of  x  and  n),  .68; 
"quick  motor  action"  (speed  of  striking  a  single  typewriter  key) 
.54;  a  symbol-digit  substitution  test,  .52;  a  directions  test,  .17 
Seashore's  sense  of  time  test  (by  phonograph  record),  .10;  and  memory 
span  (reproduction  of  fines  of  abstract  words)  Part  I,  —.30,  and 
Part  II,  -.11. 

In  the  most  recent  of  the  Iowa  group  of  experiments  Johnson 
(1925,  unpublished)  employed  as  criterion  the  mean  of  the  five 
Blackstone  three-minute  copying  tests  (v.s.).  The  subjects  consisted 
of  four  successive  classes  of  high  school  students.  The  psychological 
tests  were  administered  before  the  instruction  in  typing  and  the 
criterion  measures  were  taken  after  three  semesters  of  training. 
Upon  the  amalgamated  group  of  N  =  124  students  the  following 
correlations  were  reported:  spelling  of  difficult  words  arranged  in 
the  form  of  a  business  letter,  .28;  an  Iowa  State  Spelfing  List  (Grade 
VIII,  50  words),  .24;  Monroe  Silent  Reading  Test  (III-l),  for  rate, 
.22,  and  for  comprehension,  .12;  "Mechanical  A,"  i.e.  the  rate  of 
tapping  the  key  j  on  a  typewriter  during  four  periods  of  15  seconds 
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each  with  rest  intervals  of  15  seconds  between  each  trial,  .10; 
"Mechanical  B,"  i.e.  the  rate  of  tapping  alternately  the  letter  j  and/ 
during  three  periods  of  30  seconds  each  with  rest  intervals  of  15 
seconds,  .09;  "Mechanical  C,"  i.e.  tapping  successively  j, /,  and  the 
space  bar  during  three  periods  of  30  seconds  with  rest  intervals  of 
15  seconds,  .19;  Greene's  Organization  Test,  Series  C,  Enghsh,  Test 
No.  1,  Form  B,  .004;  and  Pressey's  "Mental  Survey  Scale,"  Schedule 
D,  Scale  No.  1,  .002.  The  coefficients  reported  for  the  four  com- 
parable subgroups,  whose  populations  ranged  from  22  to  37,  showed 
considerable  variability. 

Link  (1919,  pp.  411-423)  specified  as  prognostic  tests  a  completion 
test,  a  letter  for  letter  substitution  test,  and  a  checking  of  misspelled 
words  test,  but  makes  no  note  of  any  experimental  work  or  actual 
results. 

A  carefully  planned  effort  to  construct  a  criterion  measure  of  pro- 
ficiency in  typing  was  reported  by  Toops  (1923,  pp.  133-134)  upon 
N  =  37  business  college  students  in  a  study  of  the  predictive  value 
of  tests  which  resulted  in  the  I.  E.  R.  General  Clerical  Test.  The 
criterion  was  constructed  from  the  following  five  variables  which 
were  weighted  in  accordance  with  the  "compromise"  of  the  estimates 
of  their  importance  made  by  two  examiners:  (1)  number  of  days 
required  to  complete  lessons  1-10  of  the  copy  text  book,  (2)  the 
number  of  days  required  for  lessons  11-20,  (3)  the  number  of  days 
for  lessons  21-30,  (4)  the  average  monthly  school  marks,  and  (5) 
the  average  of  two  independent  rankings  by  the  teacher  for  "potential 
abihty"  in  typing.  The  "compromise"  weighting  for  these  five 
elements  were  4,  5,  3,  3,  and  5,  respectively.  The  product  moment 
r's  for  32  tests  ranged  from  —.15  to  .55,  and  for  chronological  age 
and  school  grade  completed,  r's  were  .08  and  .45,  respectively.  The 
r's  which  he  obtained  for  the  ten  I.  E.  R.  tests  (which  are  to  be 
reported  upon  in  the  present  study)  were  .22,  .20,  .29,  .30,  .44,  .55, 
.21,  .42,  .22,  and  .21.  Toops  (pp.  93-94)  also  reported  results 
obtained  upon  another  group  composed  of  N  =  28  experienced 
typists  in  employment.  The  criterion  consisted  of  the  routine 
semi-annual  ratings  made  by  the  superintendents  by  means  of  the 
Scott  rating  scale  plan  as  a  basis  for  promotions  in  salary.  For  the 
total  score  in  the  ten  I.  E.  R.  tests  weighted  in  accordance  with  the 
regression  coefl&cients  obtained  upon  the  data  from  the  business 
college  experimental  group  the  product  moment  r  was  .02±.13 
For  the  ten  tests  considered  separately  the  following  r's  were 
obtained:  .15,  .17,  -.18,  .00,  .04,  .23,  -.15,  -.02,  -.18,  and  .10, 
Other  correlations  were  as  follows:  Woodworth- Wells  number  group 
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checking,  .05;  finding  addresses  (Thorndike),  .17;  speed  of  writing 
the  consecutive  numbers  from  11  upwards,  —.07;  checking  "same- 
different"  numbers  from  the  Army  Beta  test,  .01;  name  filing 
(Thurstone),  —.16;  another  "same-different"  numbers  test,  .15;  and 
school  grade  completed,  .25. 

Upon  a  group  of  30  typists  in  an  office  of  a  large  Education 
Authority  in  London,  Burt  (1922)  found  that  the  correlation  of  a 
special  subject  matter  test  in  typing  with  imputed  proficiency  in 
typing  was  .60.  Against  this  subject  matter  test,  which  was  com- 
posed of  (1)  typing  from  memory,  (2)  typing  from  copy,  (3)  accuracy 
in  typing  a  "fair  copy"  from  copy  previously  marked  for  correction 
in  the  manner  of  printers'  proof,  (4)  arrangement  for  display  of  an 
announcement,  and  (5)  tabulating  and  transcription  of  very  illegible 
manuscript,  the  following  correlations  were  reported  for  the  psycho- 
logical tests:  mixed  sentences,  .37;  opposites,  .35;  arithmetic,  .28; 
completion,  .46;  definition  of  simple  technical  words,  .27,  synonyms, 
.33;  analogies,  .31;  and  spelhng,  .44. 

Upon  a  group  of  28  typists  in  the  French  Typing  Branch  of  the 
International  Labour  Office  at  Geneva  engaged  in  homogeneous 
work  and  "employed  more  exclusively  on  typewriting  than  is  the 
case  in  the  majority  of  offices,"  Bieneman  (1923)  employed  as 
criterion  measures  the  independent  rankings  by  two  supervisors. 
The  following  correlations  (presumably  by  one  of  the  rank  order 
methods)  were  reported  for  each  of  the  21  tests  employed  against 
each  supervisor's  ranldngs:  spelhng  (of  continuous  unpunctuated 
text),  .63  and  .62;  punctuation  (of  same)  .47  and  .42;  number  of 
permutations  of  letters  a,  b,  c,  d  in  one  minute  (Claparede),  .42  and 
.59;  cancellation,  quahty,  of  forms  (Toulouse  and  Pieron,  Barcelona 
form),  .42  and  .47;  form  recognition  (Whitley),  .39  and  .30;  "sentence 
completion"  (reading  of  sentences  in  which  all  vowels  were  omitted), 
.35  and  .44;  cancellation,  quantity,  of  forms  (v.s.),  .32  and  .39; 
memory  for  phrases  (reproduction  of  sentences)  .30  and  .62;  reaction 
time,  auditory  and  visual,  %  of  average  variation  (d'Arsonval 
chronometer),  .26  and  .09;  time  of  arranging  six  graduated  weights, 
.24  and  .21;  tapping,  right  hand,  (key  and  counter),  .20  and  .22; 
tactile  sensibihty  (i.e.  two-point  threshold),  .18  and  .26;  cutting 
between  converging  wavy  fines,  .18  and  .48;  manual  asymmetry  (?) 
.17  and  .15;  memory  (15  words),  .11  and  .25;  speed  of  writing,  .10 
and  .37;  tapping,  left  hand  (v.s.),  .06  and  -.03;  reaction  time,  mode 
(v.s.),  .05  and  -.04;  mixed  sentences  (oral,  to  be  rearranged  into 
logical  sentences),  .04  and  .32;  and  spatial  estimation  (i.e.  space 
required  for  a  letter  from  appearance  of  the  manuscript),  -.05  and  .01. 
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Muscio  and  Sowton  (1923)  reported  an  experiment  upon  eleven 
groups  of  typists,  the  N's  being  10,  12,  14,  14,  18,  20,  22,  22,  24,  30, 
and  60.  Five  groups  were  composed  of  students  and  six  groups  were 
in  employment.  The  criterion  measures  for  various  groups  were 
instructors'  and  supervisors'  ratings,  school  examinations  in  typing 
performance  and,  in  one  case,  in  "general  knowledge,"  and,  in  three 
groups,  the  quotient  obtained  by  dividing  the  number  of  hours  of 
practice  required  by  an  individual  to  attain  his  speed  by  the  average 
time  required  by  the  group  to  reach  that  speed.  In  all,  thirty-four 
tests  of  all  kinds  were  administered  to  one  or  more  of  the  groups. 
Correlation  coefficients  (rho's)  to  the  number  of  124  were  presented, 
the  more  important  of  which  may  be  smnmarized  as  follows  in  order 
of  decreasing  average  magnitude:  immediate  memory  (sentences), 
mean  of  seven  coefficients  =  .43,  range  .04  to  .72;  spelHng  (Link  and 
Burt),  mean  of  eight  coefficients  =  .40,  range  .13  to  .81;  directions 
(modeled  upon  Pintner  and  Toops),  mean  of  seven  coefficients  =  .30, 
range  .13  to  .72;  "touch  cards,"  i.e.  the  time  required  to  sort  with 
eyes  closed  40  playing  cards  with  two  to  four  small  holes  punched 
in  them  close  together  near  the  center,  three  coefficients  =  .46,  .44, 
and  .10;  Woodworth-Wells  verb-object,  two  coefficients  =  .43  and 
.23;  finding  products  of  pairs  of  two-place  numbers  from  a  table 
(Thorndike),  mean  of  seven  coefficients  =  .31,  range  —.40  to  .62; 
completion  (Trabue),  mean  of  seven  coefficients  =  .31,  range  .08 
to  .48;  "analyzing  table"  (Cody),  mean  of  seven  coefficients  =  .26, 
range  —.17  to  .68;  "discrimination  (sensory)",  i.e.  writing  S  or  D 
between  same  or  unhke  pairs  of  numbers,  of  groups  of  letters,  of 
names,  etc.,  two  coefficients  =  .40  and  .12;  opposites,  (Woodworth- 
Wells  easy  opposites  and  Trabue  and  Stockbridge  graded  opposites, 
mean  of  five  coefficients  =  .18,  range  .00  to  .33;  substitution  (Wood- 
worth-Wells and  Muscio),  mean  of  ten  coefficients  =  .16,  range  —.20 
to  .59;  action-agent  (Woodworth-Wells),  two  coefficients  =  .27  and 
.00;  cancellation  (Woodworth-Wells  and  Muscio),  mean  of  five  co- 
efficients =  .11,  range  —.23  to  .26;  equivalence  in  grip  of  right  and 
left  hand  by  Smedley  dynamometer,  mean  of  four  coefficients  =  .11, 
range  —.50  to  .41;  discrimination  of  meanings,  on  the  principle  of 
Army  Alpha  test  4,  mean  of  four  coefficients  =  .10,  range  .00  to  .32; 
and  immediate  memory  (digits),  mean  of  four  coefiicients  =  .08,  range 
.05  to  .16.  For  only  one  test  (verb-object)  was  the  obtained  mean 
value  thi-ee  times  as  large  as  its  "mean  variation."  For  Army 
Alpha  correlated  against  the  results  of  a  school  examination  in 
typing  from  copy  upon  two  small  groups,  N  =  12  students  of  about 
seven  months'  training  and  N  =  14  students  of  about  eleven  months* 
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training,  the  reported  rho's  were  as  follows:  for  total  Alpha,  .20  and 
.26;  and  for  the  eight  elements  of  Alpha,  .15  and  .10,  .21  and  .45 
.27  and  .77,  .00  and  .00,  .46  and  .41,  -.40  and  .69,  .00  and  .60,  and 
.29  and  .00.  Stanford-Binet  I.Q.'s  were  obtained  upon  10  experienced 
typists  engaged  in  various  types  of  office  work.  Correlation  co- 
efficients could  not  be  calculated  upon  this  group,  but  there  was 
evidence  of  a  positive  correlation  of  S-B  I.Q.  with  the  level  of 
difiiculty  and  the  quaht}''  of  the  work. 

The  recent  experiment  of  Gronert  (1925)  reverts  to  the  primitive 
methods  of  Lough  and  Lahy.  Teachers'  marks  served  as  a  criterion, 
and  only  one  test  was  employed,  a  letter-digit  substitution  test. 
Efficiency  in  the  test  was  ascertained  both  from  speed  in  three 
applications  of  the  test  and  from  capacity  to  improve  in  successive 
trials.  Positive  correlation  is  demonstrated  by  comparing  the 
average  test  results  achieved  by  superior  typing  students  and  those 
achieved  by  mediocre  and  inferior  students. 

Hull  and  Limp  (1925)  reported  the  correlation  of  Hoke's  Group 
Prognostic  Test  of  Stenographic  Ability  with  typewriting  to  be  .22, 
N  =  107  first  year  high  school  students.  Further  details  of  this 
study  were  not  yet  published  at  this  time  of  writing. 

Many  of  the  foregoing  studies  reported  higher  coefficients  varying 
between  .5  and  .7  for  various  teams  of  tests,  calculated  either  as 
unweighted  sums  or  averages  of  the  scores  in  the  most  favorable 
tests,  or  as  regular  "multiple  correlations."  The  writer  is  not  con- 
vinced however  that  in  the  case  of  the  small  experimental  popula- 
tions upon  which  the  results  usually  were  obtained,  such  multiple-R's 
have  much  meaning.  Yule  (1907,  p.  193  and  1922,  pp.  248-249) 
shows  that  upon  variables  totally  uncorrelated  a  multiple-R  due  to 
purely  chance  fluctuations  may  arise  as  high  as 
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wherein  n  =  the  number  of  variables  considered  and  N  =  the 
number  of  individuals  measured.  Therefore,  with  an  increase  in 
the  number  of  test  variables  and  a  decrease  in  the  population  of 
the  experimental  group,  the  multiple-R  may  take  increasing  values 
until,  as  a  limiting  case,  when  the  number  of  test  variables  becomes 
approximately  equal  to  the  number  of  cases  observed  the  multiple-R 
must  approach  unity. 

An  analysis  of  the  previous  experimentation  from  the  standpoint 
of  criterion  measures  shows  their  great  variety.  They  may  be 
classified  as  follows: 
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(1)  Subjective  ratings  by  instructor  or  by  supervisor. 

(2)  Teachers'  marks. 

(3)  Scores  upon  copying  and  other  tests,  standardized  and  un- 
standardized. 

(4)  Actual  production  of  typed  papers  by  typists  in  employment 
(Rogers). 

(5)  Time  required  to  achieve  a  perfect  copy  of  a  given  assignment 
(Brewington)  or  to  cover  a  certain  section  in  the  text  book 
(Toops). 

(6)  Measures  based  upon  the  relative  rate  of  learning  in  a  typing 
class  (Muscio  and  Sowton). 

(7)  Weighted  composite  of  1,  2  and  5  (Toops). 

Over  half  of  the  experimenters  employed  objective  measures  based 
upon  copying  tests  (3).    Next  in  popularity  were  subjective  ratings 

(1). 
The  types  of  test  variables  represented  may  be  classed  as  follows: 

(1)  Tests  of  the  "intelligence  test"  type,  including  arithmetic 
and  spelling, 

(2)  Simple  "association"  tests. 

(3)  Tests  of  "motor  capacity,"  which  may  be  subdivided  into  (a) 
"pencil-and-paper"  tests  and  (b)  instrumental  tests. 

(4)  Ratings  of  character  traits  (Ohmann). 

(5)  Non-laboratory  variables,  such  as  chronological  age  (Robinson, 
Toops)  and  school  grade  completed  (Toops). 

Of  these,  the  simple  "association"  and  "motor"  tests  were  the 
most  frequently  used,  with  some  form  of  substitution  test  an  out- 
standing favorite. 

The  principles  guiding  the  trial  selection  of  test  variables  among 
different  investigators  may  be  classified  as  follows : 

(1)  The  selection  of  tests  which  would  seem  to  measure  the  psycho- 
logical functions  upon  which  proficiency  in  typing  appears  to 
depend. 

(2)  A  more  or  less  empirical  selection  of  tests  and  other  variables, 
the  proof  of  which  would  rest  entirely  upon  correlational 
results. 

(3)  A  selection  of  tests  which  have  shown  promise  under  other 
investigators. 

A  summary  of  the  sizes  of  the  experimental  groups  upon  which 
the  correlation  coefficients  were  calculated  reveals  an  important 
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source  of  weakness  in  the  body  of  accumulated  coefRciential  evidence 
of  the  correlation  of  test  variables  with  proficiency  in  typing.  Thirty- 
eight  fairly  distinct  groups  have  been  reported  upon  by  fourteen 
investigators.  In  two  of  these  groups  the  populations  were  10  or 
less,  in  13  groups  N's  ranged  from  11  to  20,  and  in  13  groups  N's 
ranged  from  21  to  30.  The  larger  populations  were  as  follows:  38, 
about  40,  60,  62,  65,  107,  124,  137,  225  and  255.  Thus,  twenty-eight 
of  these  groups  were  composed  of  less  than  30  individuals.  A  con- 
sideration of  probable  error  tables  (e.g.  Holzinger,  1925,  and  Kohs, 
1923,  pp.  297-299)  shows  how  little  is  the  meaning  attached  to  a 
single  coefficient  in  the  great  majority  of  the  reports  which  have 
been  reviewed  in  the  preceding  pages.  Conventionally  a  product 
moment  r  should  be  at  least  three  times  as  large  as  its  probable  error 
in  order  to  be  considered  "significant."  Statistically  this  means 
that  if  actually  a  pair  of  variables  were  uncorrelated,  an  obtained 
positive  coefficient  as  large  as  +  3  P.E.  or  larger  would  appear  by 
pure  chance  only  once  in  about  46  trials.  Accordingly,  when  N  =  20 
the  obtained  r  must  be  at  least  .45,  and  if  N  =  30  it  must  be  at  least 
.37,  in  order  that  the  probability  that  it  is  merely  the  result  of  a 
chance  fluctuation  may  be  reduced  to  this  extent.  If  the  obtained 
r's  upon  these  populations  are  only  ,30  and  .25,  respectively,  there 
is  a  probability  of  about  one  in  eleven  that  these  values  have  arisen 
purely  from  chance  fluctuations  in  material  which  is  actually  un- 
correlated. Similar  instructive  calculations  may  be  made  for  the 
case  of  material  wherein  the  actual  correlation  may  be  positive  but 
so  low  as  to  be  of  little  practical  utility. 

This  fact  of  statistical  unreliability  may  account  for  a  large  part 
of  the  marked  disagreement  among  the  results  obtained  by  different 
investigators,  and  even  by  the  same  investigator,  from  the  use  of  a 
given  test.  Other  factors  sharing  the  responsibility  for  the  equivocal 
results  may  be  the  low  reliability  of  the  criterion  measures  and  tests 
themselves,  the  differences  in  experimental  procedure  and  in  the 
variation  of  the  experimental  groups.  Although  the  general  tendency 
is  unquestionably  a  positive  correlation  between  proficiency  measures 
and  test  scores,  it  is  impossible  to  venture  even  a  general  estimate 
of  its  magnitude  because  of  a  selective  factor  in  the  very  reporting 
of  experimental  results  in  that  some  contributors  have  unfortunately 
omitted  detailed  mention  of  indifferent  results  in  the  mistaken 
belief  that  they  are  not  of  scientific  interest.  For  the  smaller  the 
populations  and  the  less  rehable  the  measures,  the  greater  the  possi- 
bility for  occasional  extreme  values  of  the  correlation  coefficient  to 
appear.    By  merely  increasing  the  number  of  variables  to  be  assayed 
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under  such  conditions,  and  by  reporting  only  the  most  "favorable" 
of  the  obtained  values,  an  experimenter  may  be  able  to  present  a 
collection  of  impressive  coefficients  which,  however,  subsequent 
investigators  using  the  same  tests  may  be  unable  to  corroborate. 
When  the  conditions  of  a  correlational  experiment  are  manifestly 
inadequate,  safety  hes  only  in  reporting  all  of  the  obtained  results. 


CHAPTER  n 

The  Experiment 

A  Preliminary  Investigation 

At  the  time  when  the  present  experiment  was  begun  in  1922  the 
only  previous  contributions  available  were  those  of  Lough,  Lahy, 
Rogers,  Link  and  Poffenberger. 

A  preHminary  experiment  was  made  upon  groups  ranging  in  size 
from  22  to  150  among  students  in  the  commercial  classes  of  Columbia 
University  who  had  had  one  and  two  years  of  training.  The  cri- 
terion measures  were  "net  words  per  minute"  in  twelve  four-minute 
speed  tests  upon  material  from  the  Underwood  Credential  Typewriting 
Tests,  errors  being  penahzed  ten  words.  A  similar  objective  measure 
of  proficiency  in  stenography  was  attempted  on  the  basis  of  the 
speed  of  transcribing  from  notes  material  which  had  been  dictated 
at  three  different  rates  of  speed.  Tests  were  chosen  which  should 
sample  a  wide  range  of  psychological  functions:  Army  Alpha;  number 
group  checldng,  digit  cancellation,  form  substitution,  and  constant 
increment  from  the  Wood  worth- Wells  (1911)  series;  word  building: 
a-e-o-b-m-t  (Whipple,  1914,  Vol.  II,  p.  275);  serial  association 
(Whipple,  1914,  Vol.  II,  pp.  44-53);  ''Marble  Statue"  (Whipple, 
1914,  Vol.  II,  p.  209)  and  "Golden  Goose"  (Pyle,  1913,  p.  12); 
straight  maze,  (Whitley,  1911,  pp.  87-91);  pencil-and-paper  tapping 
for  each  hand  (SulHvan,  1922,  p.  25);  and  speed  and  quality  of  hand- 
writing. 

The  conclusions  indicated  from  a  survey  of  the  data  obtained  upon 
these  measures  were  that  the  study  should  be  confined  to  typing, 
and  that  a  more  carefully  controlled  procedure  was  necessary  in 
order  to  obtain  results  more  susceptible  to  interpretation.  Accord- 
ingly in  the  organizing  of  the  major  investigation  the  following  points 
were  considered: 

(1)  The  choice  of  an  experimental  population  more  representative 
of  typists  in  general. 

(2)  Increasing  the  size  of  the  experimental  group. 

(3)  The  choice  of  a  more  homogeneous  copying  material  in  order 
to  make  a  more  fundamental  study  of  the  elements  entering 
into  an  objective  criterion  measure. 

14 
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(4)  The  addition  of  teachers'  numerical  terminal   marks   as   a 
criterion. 

(5)  A  more  conservative  choice  of  test  variables  from  the  stand- 
point of  rehabihty,  objectivity  and  ease  of  administering. 

(6)  The  giving  of  each  test  twice  to  the  same  groups  in  order  to 
ascertain  the  factor  of  rehabihty. 

The  Experimental  Data 

The  data  for  the  major  experiment  were  obtained  during  April, 
May  and  June,  1922  upon  304  unselected  female  students  in  Juha 
Richman  High  School,  New  York  City,  who  were  completing  their 
third  and  last  year  in  typewriting  training  in  a  commercial  course. 

The  typing  measures  consisted  of  48  minutes  of  speed  typing 
from  plain  copy,  divided  into  twelve  four-minute  tests  given  three 
at  a  sitting  on  the  four  days,  May  24,  June  5,  7,  and  9.  After  each 
single  four-minute  test  was  completed  it  was  given  an  initial  scoring 
by  the  student  herself,  who  noted  the  gross  words  and  marked  the 
errors.  This  served  as  a  rest  interval,  and  as  a  means  of  encouraging 
the  students  to  maintain  a  maximum  effort.  The  question  as  to 
whether  motivation  was  uniform  cannot  be  answered.  Although 
the  subjects  gave  every  appearance  of  working  at  a  maximum  effort, 
the  typing  tests  on  successive  days  showed  steady  though  sHght 
average  decreases  in  the  number  of  gross  words  and  increases  in 
the  number  of  errors  of  all  kinds.  In  the  psychological  tests,  on  the 
contrary,  there  was  in  almost  every  test  an  average  improvement 
in  score  upon  the  second  giving. 

The  material  (Kimball,  1915)  was  from  the  copy  used  in  the  Inter- 
national Typewriting  Contests  held  in  New  York  October  25,  1915. 
It  is  made  up  of  familiar  short  words,  averaging  about  4.73  strokes 
per  word  as  strokes  are  counted  in  accordance  with  the  International 
Typewriting  Contest  Rules  (see  Underwood,  etc.,  1925,  pp.  12-16). 
The  phraseology  differs  greatly  from  that  of  the  usual  business 
correspondence.  It  is  a  rambhng  narrative  of  a  fishing  trip  with 
numerous  parenthetical  personal  observations  on  the  incidents 
thereof,  abounding  in  long  sentences  of  which  the  following  excerpt 
is  a  not  unfair  example : 

"The  first  thing  you  have  to  do  when  you  set  out  to  go  fishing  is  to  get  your 
bait,  or  to  speak  by  the  card  to  dig  your  bait  for  that  is  what  you  do,  you  dig  for 
it.  You  take  your  shovel  and  go  out  back  of  the  house  where  the  water  runs 
from  the  sink  drain  and  you  spend  your  time  hunting  for  things  which  you  never 
hunt  for  under  any  other  circumstances,  and  those  things  are  worms.  I  never 
do  it  without  something  of  a  blush  manthng  my  fair  brow  for  I  always  think  what 
my  friends  would  have  to  say  if  they  could  see  a  man  of  parts,  such  as  some  of 
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them  take  me  for  who  do  not  know  me  very  well,  out  digging  in  the  ground  for  just 
worms,  not  fancy  worms  mind  you,  but  plain  every-day  worms,  such  as  no  man 
in  his  five  senses  would  deem  worthy  of  his  notice," 

Upon  inspection  the  Kimball  material  appeared  to  be  of  uniform 
difficulty  in  all  parts  in  respect  to  phraseology  and  "stroke-inten- 
sity." In  the  Blackstone  (1923)  copying  tests,  homogeneity  in  stroke- 
intensity  is  assured  by  a  careful  construction  of  the  material  (v.  s.) 
but  unfortunately  his  tests  were  not  yet  published  at  the  time  of  this 
study. 

The  3648  four-minute  papers  forming  the  basis  of  the  objective 
proficiency  measures  were  first  scored  by  the  students  themselves, 
and  were  later  scored  de  novo  by  the  writer  himself,  who  noted  the 
actual  gross  or  attempted  words,  the  number  of  errors  as  defined  in 
the  current  International  Typewriting  Contest  Rules,  the  number  of 
omitted  words,  and  the  number  of  repeated  and  added  words. 
After  this  scoring  was  completed,  the  first  third  of  the  papers  were 
scored  a  second  time  by  the  writer,  and  another  third  received  a 
second  scoring  by  the  writer's  wife.  This  repeated  scoring  appeared 
to  insure  a  high  degree  of  accuracy  for  the  error  scores  which  was 
necessary  in  view  of  their  relatively  low  reliability. 

Four  criterion  measures  were  employed : 

(1)  A  "speed"  criterion,  i.  e.,  the  number  of  gross  or  attempted 
words. 

(2)  An  "accuracy"  criterion,  i.  e.,  the  ratio  of  the  difference  be- 
tween attempted  words  and  errors  to  the  number  of  attempted 
words. 

(3)  A  combined  "speed  and  accuracy"  criterion,  i.  e.,  the  results 
obtained  by  scoring  the  papers  in  accordance  with  the  Inter- 
national Typewriting  Contest  Rules. 

(4)  A  criterion  based  upon  the  average  of  the  six  semester  final 
grades  in  typing  classes  earned  during  the  entire  three  years' 
course. 

The  reliability  coefficients  and  intercorrelations  of  these  four  cri- 
teria were  obtained  upon  the  total  group  of  N  =  304  cases.  More 
will  be  said  in  explanation  and  justification  of  the  choice  of  these  four 
measures  in  later  sections. 

For  the  purpose  of  treating  the  data,  the  304  cases  have  been 
divided  into  five  groups.  A,  B,  C,  D,  and  E,  upon  the  basis  of  the 
psychological  tests  taken,  all  of  which  are  "solid"  except  for  the 
correlations  involving  the  variable  "student  activities."  This  divi- 
sion was  due  entirely  to  the  administrative  requirements  of  the 
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school,  and  all  groups  may  be  considered  equivalent  and  representa- 
tive in  respect  to  typing  training  and  proficiency. 

The  psychological  tests  and  the  time  of  giving  and  their  populations 
are  listed  below : 

Army  Alpha  (see  Yerkes,  ed.,  1921,  pp.  219-234  and  157-162), 
Form  5,  was  given  on  April  25  and  Form  7  on  April  28  to  groups 
A  +  B,  N=  123. 

The  I.  E.  R.  General  Clerical  Test  (see  Toops,  1923,  pp.  63-99) 
was  given  on  April  28  and  an  unpublished  alternative  form  June  8  to 
groups  A  +  B,  N  =  123. 

Three  tests  from  the  Woodworth-Wells  (1911,  pp.  29-41;  24-29; 
and  53-55)  series  were  also  given:  number  group  checking,  two  appli- 
cations, keys  8-9  and  6-7,  time  limit,  120  seconds;  digit  cancellation, 
two  applications,  keys  6  and  8,  time  limit  100  seconds;  and  form 
substitution,  two  applications,  keys  2-5-4-1-3  and  3-1-5-2-4,  time 
limit,  100  seconds.    The  alternative   Woodworth-Wells  tests  were 
given  two  days  apart  during  the  class  periods  in  which  the  typing 
measures  were  taken.    The  results  have  been  treated  separately  for 
the  two  populations,  Group  A,  N  =  77,  and  Group  C  +  D,  N  =  137, 
since  Group  A  had  recently  taken  three  or  four  forty-minute  omnibus 
group  tests  while  Groups  C  +  D  were  perhaps  naive  in  respect  to 
group  psychological  tests.    The  results  however  did  not  indicate 
that  this  caution  was  necessary  since  there  were  no  significant  dif- 
ferences in  the  means  or  standard  deviations  for  the  two  populations. 
Otis  Advanced  Examination  (Otis,  1920),  Form  B,  was  given  on 
June  30  by  the  school  officials.    This  is  the  routine  group  intelligence 
test  given  yearly  to  the  graduating  students.    There  was  no  alterna- 
tive form  given  for  this  test.    Gross  scores  were  used  in  the  corre- 
lations since  among  these  students  the  influence  of  chronological  age 
was  probably  negligible.     The  results  were  treated  separately  for 
Groups   A  -h  B,   N  =  123,   and   Groups   C  +  E,   N  =  153   because 
Groups  A  and  B  had  recently  taken  four  forty-minute  group  tests  of 
similar  character.    That  this  separation  was  advisable  may  be  noted 
from  the  difference  between  the  means  for  the  two  groups,  the  re- 
spective means  being  160.2  and  117.7.    The  difference  accordingly 
was  42.5  and  the  probable  error  of  this  difference  was  only  1.47. 

The  first  year  English  grades  (F.  Y.  E.  G.),  i.  e.,  the  teachers' 
numerical  marks  in  the  English  classes  earned  in  the  two  semesters 
of  the  first  year  (two  years  previous  to  the  year  when  the  typing 
measures  were  taken)  were  tabulated  upon  the  total  N  =  304  cases. 
This  variable  was  added  in  order  to  compare  the  predictive  value  of 
former  school  marks  with  that  of  psychological  tests. 
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Chronological  ages  (C.  A.)  at  the  time  of  the  experiment,  N  =  304, 
also  were  considered.  Though  ostensibly  a  "static  physical  charac- 
ter," C.  A.  in  this  case  where  the  subjects  become  selected  chrono- 
logically on  the  basis  of  their  success  in  progressing  through  a  school 
course  assumes  the  character  of  the  measure  of  a  complex  trait 
which  may  be  called  "over-ageness." 

A  unique  variable  designated  as  "student  activities"  was  obtained 
by  adding  the  number  of  activities  in  the  student  and  school  organi- 
zation in  which  the  student  participated  during  her  three  year  course 
as  listed  in  the  "Commencement,  1922"  number  of  the  "Bluebird," 
the  students'  publication.  The  unit  chosen  was  "one  activity  for 
one  semester."  More  than  a  hundred  distinct  activities  were  noted, 
ranging  from  "president  of  the  general  organization"  or  "editor  of 
the  'Bluebird'"  or  "secretary  to  Mr.  X  (a  department  head)"  down 
to  "lunch  marshal"  or  "glee  club"  or  "basketball  club."  Since  these 
units  were  by  no  means  equivalent  or  homogeneous,  their  statistical 
behavior  might  have  been  improved  by  some  system  of  weighting, 
but  since  it  was  apparent  that  the  students  holding  the  "high" 
offices  also  tended  to  participate  in  a  larger  number  of  the  smaller 
activities  this  absence  of  weighting  of  items  is  probably  no  more 
serious  than  in  the  analogous  case  of  the  items  of  the  usual  "group 
intelligence  test"  such  as  Army  Alpha  or  Otis.  The  scores  ranged 
from  0  to  20.  The  distribution  showed  a  marked  positive  skew, 
about  60%  of  the  cases  falling  between  1  and  5  activities,  inclusive. 
There  were  many  "experimental  casualties"  in  the  population  for 
this  variable.  Students  who  failed  to  achieve  graduation  at  this 
time,  and  those  who  neglected  to  provide  their  photographs  for  the 
"annual"  did  not  appear  therein,  and  those  who  had  not  spent  the 
previous  three  years  in  this  high  school  were  excluded  from  the  count, 
so  that  the  measure  could  be  obtained  finally  upon  only  N  =  240 
cases. 

Since  the  separate  test  elements  of  Army  Alpha  and  of  I.  E.  R. 
are  themselves  psychological  tests  capable  of  independent  treat- 
ment, the  Ust  of  test  variables  employed  amounts  to  twenty-seven. 
"ReliabiHty  coefficients"  from  repeated  measures  were  obtainable 
on  all  variables  except  Otis. 

Among  these  variables  the  largest  number  are  of  the  type  which 
are  commonly  used  in  making  up  omnibus  group  "intelligence  tests." 
The  next  largest  number  may  be  described  as  "association  tests" 
which  seem  to  require  chiefly  a  quick  perception  and  psycho-physical 
coordination.    Only  one  test  (I.  E.  R.  3)  is  representative  of  "motor 
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tests."    The  three  non-laboratory  variables  are  measures  of  complex 
traits  which  are  not  easy  to  analyze  or  identify. 

Statistical  Treatment 

All  of  the  correlation  coefficients  were  computed  by  some  equivalent 
of  Pearson's  product  moment  formula  except  where  otherwise  noted. 
Upon  smaller  populations  the  familiar  gross  score  formula 

N2XY  -  2X  •  2Y 


V  NSX^  -  (SX)2   V  NSY2  -  (SY)2 

was  employed.  A  "job-analyzed"  description  of  the  method  is 
given  by  Toops  (1922).  This  method  is  more  economical  of  time 
upon  small  populations  and  probably  more  suitable  for  obtaining  a 
perfect  arithmetical  accuracy  than  the  usual  scatter-diagram  method 
(Yule,  1897,  pp.  822-827),  but  has  a  serious  disadvantage  in  not 
requiring  the  drawing  up  of  a  scatter-diagram  whereby  one  may  see 
the  behavior  of  the  variables  in  a  correlation  table.  For  the  more 
important  correlations,  and  in  cases  of  dubious  results,  the  co- 
efficients were  calculated  de  novo  by  the  scatter-diagram  method. 
The  groupings  of  raw  scores  were  into  18  or  more  step-intervals 
except  where  the  range  of  actual  gross  scores  was  smaller. 

The  coefficients  presented  in  the  tables  are  to  the  nearest  second 
decimal  place,  but  in  the  calculations  in  which  they  were  involved 
they  were  usually  carried  to  the  nearest  fourth  place. 

In  order  to  ascertain  the  correlations  between  a  criterion  and  the 
average  (or  sum)  of  the  scores  from  two  appUcations  (or  forms) 
of  a  test  the  following  special  form  of  the  Spearman  sums-and-differ- 
ences  formula  (Spearman,  1913,  p.  419,  or  Kelley  No.  147*)  was 
employed 

ri(A  +  B) 


V  O-^A  +  O-^B  +  2(rA<rBrAB 

For  the  correlation  of  various  combinations  of  gross  words,  errors, 
omitted  words  and  repeated  words  in  the  typing  measures,  other 
apphcations  of  the  same  procedure  were  used.  These  formulae 
require  no  special  assumptions  and  the  coefficients  obtained  from 
their  use  are  identical  with  those  which  would  be  obtained  from  an 
original  working  of  the  data  within  any  predetermined  degree  of 
precision. 

In  addition  to  the  coefficients  representing  the  obtained  correlations 
from  a  first  and  second  apphcation,  and  from  the  average  of  the  two 

*  Refers  to  T.  L.  Kelley,  Statistical  Method,  1923. 
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applications  of  equivalent  tests,  coefficients  were  also  calculated  to 
represent  a  "theoretical  maximum  correlation"  which  would  be 
obtained  if  the  two  series  of  measures  to  be  correlated  were  repeated 
an  infinite  number  of  times.  These  coefficients  were  calculated  by 
the  following  formula: 

"TMC-r"=  ''^^^^^ 


a/    2rAB        i  /  ^^^2 

y  i+TAB  y  i+r,^,^ 

in  whichri(A+B)  is  the  correlation  of  the  criterion  with  the  average 
(or  sum)  of  the  scores  of  the  two  appHcations  of  the  test,  rAs  is  the 
correlation  between  the  two  appHcations  of  the  test,  i.e.  the  reli- 
abiUty  coefficient  of  the  test,  and  ruj  is  the  correlation  of  a  half 
of  the  criterion  measures  with  the  other  half.  The  two  terms  of  the 
denominator,  it  will  be  noted,  are  the  square  roots  of  the  Spearman- 
Brown*  rehabihty  of  the  average  of  the  two  appHcations  of  the 
test  and  of  the  total  48  minutes  of  typing,  and  also  the  whole  ex- 
pression is  an  analogue  of  the  Spearman  formula  for  the  "correction 
for  attenuation"  (see  Spearman,  1904,  p.  90,  or  Yule,  1922,  p.  213). 
However,  since  the  formula  may  be  derived  from  the  Spearman 
sums-and-differences  formula  entirely  on  the  assumption  that  the 
obtained  r's  and  standard  deviations  are  equal  to  those  which 
would  appear  on  each  repetition  of  the  same  or  equivalent  measures 
(see  Kelley,  1923,  p.  204),  and  since  these  assumptions  are  by  no 
means  satisfied  in  the  data  of  this  experiment  with  its  small  popula- 
tions, the  description  of  these  coefficients  as  "theoretical  maximum 
correlations"  in  this  case  rather  than  as  "corrected  for  attenuation" 
appears  more  defensible. 

What  meaning  may  be  attributed  to  these  theoretical  coefficients? 
Although  the  structure  of  the  formula  indicates  that  chance  varia- 
tions in  the  size  of  the  component  r's  tend  to  neutralize  each  other, 
still  the  resultant  coefficient  is  perhaps  not  sufficiently  stable  to  give 
much  validity  to  any  single  coefficient.  But  when  these  coefficients 
upon  similar  measures  are  considered  in  "batteries"  for  comparative 

*  The  formula 

nru 


1+  (n-  Dm 


by  which  one  may  compute  the  theoretical  rehabihty  of  the  average  score  result- 
ing from  n  applications  of  a  test  from  the  rehability  obtained  for  one  application 
has  been  credited  to  Professor  Spearman  by  some  writers  and  to  Dr.  Brown  by 
others.  Though  derived  independently  by  the  two  authors,  its  earhest  appear- 
ances in  print  were  simultaneous,  curiously  enough  in  two  consecutive  articles 
in  the  same  number  of  the  British  Journal  of  Psychology,  Vol.  Ill,  October,  1910, 
page  281  and  page  299. 
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purposes,  as  in  this  experiment,  and  only  general  tendencies  are 
sought  for,  they  possess  some  utility,  since  an  obtained  "corrected 
for  attenuation"  coefficient  is  the  "most  probable"  value  of  the 
mean  of  the  values  which  would  result  from  repetitions  of  the  pro- 
cedure, in  accordance  with  the  principle  that  a  value  which  has  been 
obtained  at  least  once  is  a  better  "bet"  than  any  other  value  which 
has  not  yet  appeared. 


CHAPTER  III 

Data  Concerning  the  Test  Variables 

In  Table  I  are  presented  the  means,  standard  deviations,  obtained 
reliability  coefficients  and  Spearman-Brown  reliability  coefficients  for 
all  the  test  variables.  A  special  note  is  necessary  regarding  the  means 
and  standard  deviations  of  I.  E.  R.  total  and  I.  E.  R.  1.  The  weight- 
ings of  the  component  tests  in  obtaining  a  total  score  were  those  given 
by  Toops  (1923,  p.  89)  for  the  experimental  mimeographed  forms  used 
in  this  study.  The  instructions  for  Test  1  have  also  been  modified 
as  he  notes.  While  the  constants  marked  with  an  asterisk  (*)  in 
Table  I  are  therefore  not  numerically  comparable  with  those  which 
would  be  obtained  from  the  final  form  of  the  I.  E.  R.  Clerical,  the 
correlation  coefficients  wduld  be  unaffected  by  these  changes.  The 
data  mentioned  by  Toops  (ibid.)  were  upon  145  students  including 
our  present  population  of  123  cases,  the  remaining  22  cases  being 
"experimental  casualties." 

The  low  reliabilities  of  Army  Alpha  1  and  I.  E.  R.  4  are  probably 
due  to  the  coarseness  of  the  units  which  did  not  allow  sufficient 
"spread"  in  the  obtained  scores.  The  low  reliability  of  I.  E.  R.  7 
was  probably  due  to  the  fact  that  on  the  second  giving  the  time 
allowance  was  too  large,  either  because  the  alternative  form  was  not 
of  equal  difficulty  or  because  after  a  first  giving  the  students  become 
so  adept  in  this  sort  of  skill  that  the  psychological  character  of  the 
second  application  was  no  longer  the  same.  This  possibility,  which 
may  be  operative  in  many  tests  upon  repetition,  has  been  frequently 
mentioned  but  never  studied  experimentally.  The  low  reliability 
of  the  Woodworth- Wells  form  substitution  test  is  probably  to  be 
explained  on  the  basis  of  an  "interference  effect,"  since  in  the  papers 
of  the  second  giving  there  were  a  large  number  of  errors  arising  from 
substituting  the  digits  of  the  key  which  had  been  employed  only  two 
days  previously.  The  unusual  decreases  in  the  mean  scores  of  Alpha 
8  and  I.  E.  R.  6  from  the  first  to  the  second  application  cannot  be 
accounted  for. 

A  consideration  of  the  reliability  coefficients  suggested  that  their 
magnitude  bears  a  relation  to  the  rate  of  reactions  required  by  the 
respective  tests  as  measured  by  the  number  of  reactions  per  unit  of 
testing  time.  Since  most  of  the  original  test  papers  were  no  longer 
available  it  was  necessary  to  use  the  mean  scores  in  both  tests  as  an 
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approximate  measure  of  the  number  of  reactions.  For  the  21  test 
elements  for  which  rehabihty  coefficients  were  obtained  (the  mean 
results  of  the  two  groups,  N's  =  77  and  137,  for  each  of  the  three 
W-W  tests  being  employed)  the  correlation  by  Spearman's  rho- 
formula  (Kelley  No.  142)  for  the  ratios  of  mean  scores  to  time  allot- 
ment #  the  Spearman-Brown  reliability  coefficients*  was  .21.  If  the 
unsatisfactory  W-W  form  substitution  coefficient  were  omitted,  for 
the  reasons  noted  above,  this  correlation  on  N  =  20  tests  would  be 
.37.  Although  this  coefficient  is  not  large,  and  not  very  reliable  be- 
cause of  the  small  population,  still  in  view  of  the  crudity  of  the 
measures  employed  its  magnitude  suggests  the  existence  of  a  problem 
which  may  be  of  real  importance  in  test  construction. 

The  intercorrelations  of  the  major  test  variables  are  given  in  Table 
II.  Other  intercorrelations  upon  other  populations  are  as  follows: 
for  Otis  #  F.  Y.  E.  G.,  r  =  .17±.05,  N  =  153;  for  Otis  #  C.  A.,  r  = 
.Oldz.05,  N=  153;  and  for  F.  Y.  E.  G.  #  C.  A.,  r  =  -.19±.04, 
N  =  304. 

For  "student  activities"  the  intercorrelations  were  as  follows: 
with  Army  Alpha  (A),  .21±.06,  N=  100;  with  Army  Alpha  (B), 
.11±.07,  N  =  100;  with  I.  E.  R.  (A),  .14db.07,  N  =  100;  with  I.  E.  R. 
(B),  .09±.07,  N  =  100;  with  Otis,  Groups  A  +  B,  r  =  .18it.07,  N 
=  100;  and  Groups  C  +  E,  r  =  .17±.06,  N  =  134;  with  F.  Y.  E.  G., 
-.00±.04,  N  =  240;  and  with  chronological  age,  -.03db.04,  N  =  240. 

The  relatively  low  correlations  of  I.  E.  R.  with  both  Alpha  and 
Otis  would  indicate  that  it  is  not  merely  a  conventional  "group  intel- 
ligence test,"  but  one  notes  that  its  correlations  with  first-year  Eng- 
lish grades  are  of  about  the  same  magnitude  as  those  for  the  "intel- 
ligence tests." 

*  The  symbol  #  will  be  used  in  subsequent  pages  in  an  effort  to  state  briefly 
and  exactly  just  which  two  variables  are  concerned  in  the  correlation.  It  may 
be  thought  of  as  an  abbreviated  scatter-diagram. 


CHAPTER  IV 
The  Speed  Criterion 

The  problem  of  the  relative  weightings  to  be  assigned  to  the 
speed  and  accuracy  factors  in  arriving  at  a  single  score  for  a  typed 
paper  has  never  been  studied  experimentally.  The  usual  practice 
is  to  resort  to  some  arbitrary  penalty.  The  present  International 
Typewriting  Contest  Rules  (Underwood,  1925)  provides  for  a 
deduction  of  ten  words  for  each  error,  with  omitted,  repeated,  and 
added  words  counted  as  an  error  apiece.  In  former  years  the  I.  T.  C. 
Rules  assigned  a  penalty  of  only  j&ve  words  per  error  (see  Rogers, 
1922,  p.  18). 

Blackstone  (1923)  departs  markedly  from  the  I.  T.  C.  Rules  in 
that  his  arbitrary  scoring  formula 

c  Strokes  per  minute  X  10 

Score  = ^ —^7^ 

Errors  +  10 

has  the  effect  of  imposing  a  shding  scale  for  the  penahzing  of  errors 
whereby  the  first  error  in  a  test  is  penahzed  about  twelve  to  sixteen 
words  but  each  succeeding  error  is  penalized  by  a  decreasing  amount. 
Blackstone's  method  avoids  the  anomaly  of  less-than-zero  scores 
which  occur  occasionally  under  the  I.  T.  C.  Rules  scoring. 

The  writer  has  not  been  unmindful  of  the  desirabihty  and  also 
the  difficulty  of  determining  experimentally  the  best  weighting  or 
penalty  for  errors.  The  problem  hinges  upon  the  estabUshing  of  a 
satisfactory  "ultimate  criterion"  of  typing  proficiency;  if  this  were 
available  it  would  be  possible  to  obtain  a  multiple  regression  equation 
whose  regression  coefficients  would  afford  a  system  of  weighting 
for  errors  with  a  fair  underlying  rationale. 

In  this  study  the  two  elements  of  speed  and  accuracy  were  treated 
as  separate  objective  criteria.  This  permitted  a  closer  study  of 
these  factors  from  the  standpoint  both  of  their  interrelation  and 
of  their  relative  possibihties  of  being  predicted  by  psychological  test 
methods.  As  a  means  of  combining  these  factors  into  one  measure, 
the  scoring  of  the  International  Typewriting  Contest  Rules  was 
employed  as  a  third  criterion,  partly  because  it  represents  the  con- 
ventional practice,  and  partly  because  the  data  of  this  experiment 
were  not  extensive  enough  to  suggest  any  other  significantly  different 
method  of  weighting  errors. 

The  speed  criterion  consists  of  the  total  or  gross  number  of  words 
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actually  typed  during  the  48  minutes  of  speed  testing.  This  differs 
from  the  I.  T.  C.  Rules  (see  Underwood,  1925,  p.  15)  in  which  gross 
words  are  counted  from  the  printed  copy.  This  means  that  the 
I.  T.  C.  Rules  make  no  note  of  the  omission,  repetition  or  addition 
of  words  in  the  gross  word  count. 

The  speed  criterion  would  not  differ  greatly  for  correlational 
purposes  from  one  based  upon  correct  words  only.  The  correlation 
of  the  correct  words  only  #  actual  gross  words  upon  the  papers  of 
the  first  plus  third  days'  testing  was  .996,  N  =  304,  and  that  upon 
the  papers  of  the  second  plus  fourth  days'  testing  was  .993,  N  =  304, 
and  for  all  four  days'  testing,  this  r  =  .995,  N  =  304. 

It  will  be  noted  that  the  present  experiment  is  based  upon  words 
as  units  while  the  I.  T.  C.  Rules  beginning  with  the  World's  Champi- 
onship Contest  of  October,  1924,  held  in  New  York  are  based  upon 
strokes  as  units  (see  Kimball,  1924).  Since  a  large  part  of  the  data 
had  already  been  worked  out  upon  the  scoring  system  of  the  previous 
I.  T.  C.  Rules  (see  Kimball,  1921),  the  question  arose  as  to  whether 
the  labor  of  re-scoring  the  3648  typed  papers  would  be  rewarded  by 
any  significant  gain  in  precision  in  the  results.  In  order  to  decide 
this  point,  a  random  alphabetical  selection  of  the  complete  sets  of 
papers  from  50  subjects  were  re-scored  upon  the  new  stroke  basis, 
and  the  correlation  with  the  word  count  was  calculated.  (The 
I.  T.  C.  Rules  stroke  count  is  perhaps  less  precise  than  Blackstone's 
(1923)  system  in  that  the  striking  of  the  shift  key  for  capital  letters 
is  not  considered.)  This  correlation  of  I.  T.  C.  stroke  count  #  word 
count  was  found  to  be  .99876,  N  =  50.  The  probable  error  calculated 
by  the  usual  formula,  which  is  perhaps  not  vahd  in  the  case  of  so 
high  an  r  upon  so  small  a  population  (see  Soper  and  others,  1917), 
was±. 00024. 

While  the  fact  of  this  high  correlation  would  ordinarily  be  inter- 
preted to  mean  that  the  two  scorings  were  equivalent  for  purposes 
of  measurement,  it  can  be  shown  statistically  that  our  correlation 
coefficients  if  calculated  upon  a  stroke  basis  could  not  differ  widely 
from  those  obtained  upon  the  word  basis.  For  if  the  correlation  of 
a  given  test  #  word  count  (rtw)  and  the  correlation  of  the  word 
count  #  stroke  count  (r^s)  upon  the  same  population  are  known,  the 
correlation  of  the  given  test  #  stroke  count  (rta)  cannot  fall  outside 
the  limits  given  by  Yule's  (1922,  p.  250)  formula: 

rtw  *  Tw.  ±  Vl  -  r^tw  -  t\s  +  rhw  •  t\» 

If  the  conservative  estimate  is  made  that  the  correlation  of  word 
count  #  stroke  count  upon  the  whole  of  our  population  would  not 
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be  lower  than  .998,  the  extreme  divergences  possible  for  the  reported 
test  #  speed  coefficients  may  be  read  from  the  following  tabulation 
(the  signs  of  the  limits  in  case  of  negative  values  being  reversed): 


If  Ftw  equals: 

Its  must  fall  within  the  limits 

.00 

-.063 

and 

.063 

.05 

-.013 

11 

.113 

.10 

.037 

« 

.163 

.15 

.087 

.212 

.20 

.138 

.262 

.25 

.188 

,311 

.30 

.239 

.360 

.35 

.290 

.409 

.40 

.341 

u 

.457 

.50 

.444 

u 

.554 

.60 

.548 

u 

.649 

.70 

.653 

« 

.744 

.80 

.760 

« 

.836 

.90 

.871 

u 

.926 

A  comparison  was  made  of  the  reliabilities  for  the  two  scorings 
upon  these  N  =  50  cases.  For  the  stroke  count  the  obtained  correla- 
tion of  the  sum  of  the  first  plus  third  days'  measures  #  the  sum  of 
the  second  plus  fourth  days'  measures,  i.e.  raisj,  was  .9151,  N  =  50, 
and  for  the  word  count  this  correlation,  i.e.  rw,w2,  was  .9133,  N  =  50. 
It  is  doubtful  whether  this  difference  is  even  "statistically  significant." 

The  reliability  coefficient  for  the  speed  criterion  obtained  by 
pooling  alternate  day's  scores  upon  the  total  N  =  304  cases  was 
found  to  be  .91  ±.01  and  the  Spearman-Brown  reliability  of  the 
total  48  minutes  accordingly  was  .95.  For  the  accuracy  and  the 
I.  T.  C.  Rules  criteria  the  obtained  reliabilities  were  .74±.02  and 
.69 ±.02,  respectively,  and  the  corresponding  Spearman-Brown 
reliabilities  were  .85  and  .82  respectively.  Thus  it  will  be  noted  that 
gross  words  are  the  most  reliable  of  these  three  criteria,  or  more 
strictly,  a  given  degree  of  reliability  may  be  attained  with  a  less 
expenditure  of  time  in  measurement-taking.  For  a  Spearman- 
Brown  reliability  of  .95  upon  gross  words  could  be  obtained  in  48 
minutes  of  testing,  while  on  the  accuracy  and  I.  T.  C.  Rules  criteria 
the  amount  of  time  necessary  to  attain  this  same  reliability  according 
to  the  theoretical  expectation  of  the  Spearman-Brown  formula 
(Kelley  No.  159) 

„      rnn  (1  -  rn) 

ru  (1  -  rnn) 

would  be  160  minutes  for  the  accuracy  and  205  minutes  for  the 
I.  T.  C.  Rules  criteria. 

It  will  be  shown  in  a  subsequent  section  that  a  scoring  which 
ignores  errors  entirely  allows  the  maximum  reliability,  and  that  a 
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progressive  increase  in  the  penalty  (or  reward)  for  errors  will  be 
followed  by  a  progressive  decrease  in  reliability. 

The  reliability  of  the  total  three  years'  typing  grades  criterion 
presents  a  much  more  serious  problem.  The  reliabihty  obtained  by 
pooling  the  grades  for  the  three  first  semesters  against  those  of  the 
three  second  semesters  was  .55 ±.03  with  a  Spearman-Brown  reli- 
ability of  .71  for  the  average  of  the  total  three  years  of  teachers* 
semester  grades.  To  obtain  a  reliability  of  .95  would  theoretically 
require  the  typing  grades  earned  during  23  years,  a  condition  which 
is  manifestly  impracticable. 

For  the  total  48  minutes  of  typing  the  mean  number  of  gross 
words  was  2230.1,  and  the  standard  deviation,  237.8.  Since  the 
"stroke  intensity"  of  the  copy  was  about  4.73,  the  mean  and  standard 
deviation  in  terms  of  strokes  were  about  10,548.4  and  1124.8  re- 


TABLE  III 

Criterion  Correlations:  Speed 


First 

Second 

Both 

Theoretical 

N 

application 

application 

appUcations 

maximum 
correlation 

Alpha  total 

123 

.07±.06 

.08±.06 

.08±.06 

.08 

Alpha  1 

123 
123 
123 
123 
123 
123 
123 

.02±.06 
-.07=b.06 
.02db.06 
.13±.06 
.12±.06 
.24±.06 
.04±.06 

.09±.06 
-.05±.06 
.08±.06 
.13±.06 
.20±.06 
.17±.06 
.08±.06 

.06±.06 
-.07±.06 
.06±.06 
.15±.06 
.18±.06 
.21±.06 
.06±.06 

.07 

Alpha  2 

-.08 

Alpha  3 

.06 

Alpha  4 

.18 

Alpha  5 

.22 

Alpha  6  

.23 

Alpha  7 

.07 

Alpha  8 

123 
123 

.07±.06 
.22±.06 

.16±.06 
.29±.06 

.12±.06 
.27±.06 

.14 

I.  E.  R.  total 

.28 

I.  E.  R.  1 

123 

.09±.06 

.21±.06 

.16zh.06 

.18 

I.E.  R.  2 

123 

.37±.05 

.35±.05 

.39±.05 

.43 

I.  E.  R.  3 

123 

.23±.06 

.34±.05 

.30±.06 

.32 

I.  E.  R.  4 

123 

.00±.06 

.05±.06 

.03±.06 

.04 

I  E.  R.  5 

123 

.20±.06 

.24±.06 

.24±.06 

.27 

I.  E.  R.  6 

123 

.09±.06 

.15±.06 

.13±.06 

.15 

I.  E.  R.  7 

123 

.16±.06 

.27±.06 

.24db.06 

.29 

I.  E.  R.  8 

123 

.18±.06 

.19zb.06 

.20±.06 

.23 

I.  E.  R.  9 

123 

.14±.06 

-.02±.06 

.08±.06 

.09 

I.  E.  R.  10 

123 

.15±.06 

.10±.06 

.14±.06 

.16 

W-Wn.  g.  c 

77 

.30±.07 

.28±.07 

.31±.07 

.33 

(ditto)    

137 

77 
137 

77 

.20±.05 
.21±.07 
.08±.06 
.31±.07 

.13±.06 
.22±.07 
.12±.06 
.15±.07 

.17±.06 
.23±.07 
.11±.06 
.29±.07 

.18 

W-W  d.  c 

.26 

(ditto) 

.14 

W-W  subst 

.41 

(ditto) 

137 
123 
153 

.20±.05 
.12±.06 
.11±.05 

.16±.06 

.22±.05 

.28 

Otis 

(ditto) 

F.  Y.  E.  G 

304 

.12±.04 

.14 

Chron.  Age 

304 

-.14±.04 





-.15 

Student  activities. 

240 

.20d=.04 

.24 
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spectively.  This  would  mean  a  rate  of  about  46.5  gross  words  or 
219.9  strokes  per  minute.  For  the  frequency  distribution  of  gross 
words,  see  Table  IV. 

The  correlations  of  the  speed  criterion  #  the  test  variables  are 
presented  in  Table  III. 

It  will  be  noted  that  the  coefficients  tend  to  be  definitely  positive. 
Furthermore  the  coefficients  for  the  I.  E,  R.  and  W-W  tests,  which 
are  commonly  "expected"  to  show  correlation  with  typing  pro- 
ficiency, do  show  a  marked  superiority  over  the  tests  of  the  "in- 
telligence" type  as  represented  by  Army  Alpha  and  Otis.  By  a 
combining  of  the  most  promising  tests  into  a  best-weighted  composite 
by  means  of  multiple  correlation  methods  it  would  not  be  impossible 
to  construct  a  "prognostic  test"  of  some  validity  for  the  speed 
element  of  typing. 

An  objective  study  was  made  to  determine  whether  the  factors 
of  reliability  of  the  test  variable,  and  of  the  rate  of  reactions  in  a 
given  test,  bear  any  relation  to  the  magnitude  of  its  correlation 
with  the  speed  criterion.  Upon  the  21  test  elements  for  which 
reliability  coefficients  were  obtained  (the  mean  values  of  the  W-W 
coefficients  for  the  two  groupings  A  and  C  +  D  being  used)  the 
rho-correlation  of  the  Spearman-Brown  reliability  coefficients  #  the 
speed  criterion  correlations  for  the  composite  of  both  givings  was 
.37,  N  =  21.  If  the  unsatisfactory  W-W  form  substitution  were 
omitted  (see  p.  22),  this  correlation  would  be  .53  upon  N  =  20  tests. 
For  "rate  of  reactions"  as  measured  by  the  ratio  of  the  mean  score 
of  a  test  to  its  time  allotment  (see  p.  22)  the  rho-coefficient  upon 
N  =  21  was  .32.  These  positive  coefficients,  though  deficient  in 
reliability  because  of  the  small  N's,  nevertheless  suggest  the  con- 
clusion that  by  choosing  tests  which  possess  a  greater  degree  of 
statistical  reliability  and  also  require  a  more  rapid  rate  of  reactions 
one  will  tend  to  elicit  a  larger  correlation  with  measures  of  speed  in 
typing. 


CHAPTER  V 

The  Accuracy  Criterion 

To  establish  an  "accuracy  criterion"  is  by  no  means  as  straight- 
forward a  task  as  in  the  case  of  speed.  The  first  problem  concerns 
the  error  count. 

An  inspection  of  the  papers  showed  that  almost  all  errors  occurred 
within  a  word  considered  together  with  its  following  space  or  punc- 
tuation, e.g.,  striking  the  wrong  key,  transposition  of  letters,  "strike- 
overs,"  "piHng"  of  letters,  omission  of  spacing,  insertion  of  a  space 
within  a  word,  omitted  or  Hghtly  struck  letters,  reduphcated  and 
added  letters  or  characters,  "x-ed  out"  letters,  etc.  The  only  other 
group  of  errors  occurring  often  enough  to  justify  special  study  were 
the  omitted  and  repeated  or  added  words.  Errors  in  the  arrange- 
ment of  the  paper,  e.g.,  inter-hne  spacing,  length  of  Unes,  indenta- 
tions, paragraphing,  etc.,  occurred  so  rarely  that  they  could  safely 
be  relegated  to  the  group  of  intra-word  errors  in  the  subsequent  cal- 
culations. 

Omissions  and  repetitions  or  additions  of  words  presented  a  serious 
difficulty.  Though  occurring  infrequently  their  number  was  ex- 
tremely variable.  For  the  number  of  omitted  words  in  the  total  48 
minutes  of  typing  the  mean  was  6.35  and  the  S.D.  was  9.78.  For 
the  number  of  repeated  or  added  words  the  mean  and  S.D.  were 
2.43  and  5.03,  respectively.  The  frequency  distributions  of  the  omit- 
ted words  and  of  the  repeated  or  added  words  were  in  the  general 
form  of  steep  J-curves.  One  notes  a  definite  rise  in  the  frequencies 
for  the  ranges  of  about  10  to  13  words  for  both  variables.  This  was 
found  to  be  due  to  the  fact  that  many  occasions  involved  the  omis- 
sion or  repetition  of  about  an  entire  Une  of  the  printed  copy. 

The  product  moment  correlation  of  the  absolute  number  of  omit- 
ted words  in  the  pool  of  the  first  plus  third  days'  testing  #  those  for 
the  pool  of  the  second  plus  fourth  days'  testing  was  -.0035±.039, 
N  =  304,  and  for  the  number  of  repeated  or  added  words  this  corre- 
lation was  .0440±.039,  N  =  304. 

The  form  of  the  distributions  as  well  as  a  priori  considerations  sug- 
gest that  the  act  of  omitting  or  repeating  words  is  a  unitary  psycho- 
logical incident  regardless  of  the  number  of  words  involved,  and  that 
a  truer  measure  of  this  tendency  would  be  on  the  basis  of  a  dichoto- 
mization  into  "no  words  omitted"  versus  "one  or  more  words 
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Frequency  Distribution  of  Omitted  Words 
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Frequency  Distribution  of  Repeated  or  Added  Words 


9 

10 

11 

12 

13 

14 

15 

16 

17 

3 

11 

11 

15 

9 

5 

3 

6 

1 

18 

19 

20 

21 

22 

23 

24 

25 

26 

0 

2 

2 

2 

4 

1 

1 

5 

2 

27 

28 

29 

33 

35 

36 

0 

2 

1 

1 

1 

2 

38 

46 

49 

56 

61 

1 

1 

1 

1 

1 

Number  of  repeated  or  added  words 

0 

1 

2 

3 

4 

5 

6 

Frequency 

182 

37 

17 

6 

12 

4 

4 

7 

8 

9 

10 

11 

12 

13 

7 

3 

2 

2 

4 

7 

4 

14 

15 

16 

17 

18 

19 

20 

2 

2 

0 

1 

0 

1 

1 

21 

22 

23 

24 

25 

31 

1 

1 

0 

1 

2 

1 

omitted."  In  incidents  involving  the  omission  or  repetition  of  a  few 
words  only,  it  is  probable  that  the  typist  is  thinking  several  words 
in  advance  of  her  operating,  and  the  omission  or  repetition  occurs 
through  some  shp  of  the  psychophysical  coordination.  For  the  omis- 
sion or  repetition  of  longer  series  of  words,  the  fault  is  frequently  due 
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to  the  phraseology  of  the  copy,  in  which  two  successive  portions  end 
in  the  same  words.  An  actual  illustration  is  provided  in  the  material 
reproduced  on  page  16,  where  many  omissions  and  repetitions  occurred 
because  two  near-by  phrases  ended  with  "worms,". 

Accordingly  tetrachoric  correlations  (Pearson,  1900,  p.  6,  or  Kelley 
No.  207)  were  calculated  as  a  check  upon  the  reliability  coefficients 
obtained  by  the  product  moment  method.  The  requirements  of  the 
r-sub-t  procedure  may  be  satisfied  by  the  hypothesis  that  if  an  ade- 
quate measure  of  the  tendencies  to  omit  or  repeat  were  available, 
our  zero  scores  would  redistribute  themselves  in  such  a  way  as  to 
form  part  of  an  approximately  normal  frequency  surface.  Among  304 
cases,  108  made  no  omissions  in  either  series  of  tests,  94  did  not  omit 
in  the  first  series  but  did  omit  in  the  second,  49  omitted  in  the  first 
but  did  not  in  the  second,  and  53  omitted  in  both  series.  The  fourth 
degree  equation  for  this  rehabihty  of  omissions  was 


.0831777654  =  rt  +  .0087868205  r^  +  .1363603594  r\  +  .0026249903  r^  + 


whence  r-sub-t  =  .08  with  a  probable  error  (Kelley  No.  213)  of  ±.06. 
For  repetitions  or  additions  the  corresponding  distribution  was  182, 
56,  37  and  29.     The  fourth  degree  equation  for  this  reliability  of 
repetitions  or  additions  was 


.35094298  =  rt  +  .2283632942  r^  +  .0426573629  r't  +  .1208543117  r\  + 


whence  r-sub-t  =  .32 ±.07. 

The  obtained  reHabiUties  of  the  omissions  and  repetitions  were  of 
such  small  magnitude  that  in  the  present  data  these  scores,  particu- 
larly in  the  case  of  omissions,  appear  to  be  only  httle  more  than  the 
result  of  chance  factors,  and  can  scarcely  be  considered  as  measures 
of  anything.  Even  if  these  obtained  rehabihty  correlations  were  sig- 
nificantly positive,  the  amount  of  repeated  testing  necessary  to  bring 
their  product  moment  rehabiUty  r's  up  to  a  size  sufficient  for  pur- 
poses of  measurement  would  be  impracticable. 

The  product  moment  correlations  of  absolute  number  of  omitted 
words  and  of  repeated  or  added  words  #  total  or  gross  number  of 
words  were  —  .04±.04  and  —  .03±.04,  respectively.  The  product 
moment  correlation  of  omitted  words  #  repeated  words  was  .20±.04. 

If  the  actual  number  of  omitted  and  repeated  or  added  words 
were  included  in  the  error  count,  the  result  would  be  a  decrease  in 
the  reliability  of  the  error  count  with  no  gain  in  its  capacity  for  being 
predicted.  (The  effect  of  these  factors  on  the  rehabihty  of  the  Inter- 
national Typewriting  Contest  Rules  scoring  is  discussed  in  a  subse- 
quent section.)    Therefore  it  was  decided  to  depart  from  the  I.  T.  C. 
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Rules  scoring  with  reference  to  omitted  and  repeated  or  added  words, 
and  as  a  sort  of  corupromise  to  count  any  such  incident  as  one  error 
regardless  of  the  number  of  words  actually  omitted  or  repeated  and 
added.* 

The  obtained  reliability  of  this  error  count  for  the  sum  of  the  first 
plus  third  days'  testing  #  the  sum  of  the  second  plus  fourth  days' 
testing  was  .77  ±.02,  and  the  Spearman-Brown  reliability  for  all 
four  days'  testing  was  .87,  N  =  304.  The  partial  correlation  of  error 
count  for  constant  number  of  gross  words,  i.e. 

Te  e     a  a         (see  Table  VI) 

remains  at  .77,  N  =  304,  to  the  nearest  second  place,  and  the  correla- 
tion of  the  ratios  of  errors  to  gross  words  in  the  first  plus  the  third 
days'  testing  #  these  ratios  for  the  second  plus  the  fourth  days'  test- 
ing, i.e. 


xm 


was  .74,  N  =  304.  For  the  zero-order  reliability  correlation  of  the 
error  count  plus  omitted  words  plus  repeated  and  added  words  the 
obtained  coefficient  was  .54±.03,  and  the  Spearman-Brown  reli- 
ability was  .70.  The  intercorrelations  of  all  of  these  factors  are 
presented  in  Table  VI,  page  43. 

For  the  total  48  minutes  of  typing  the  mean  error  count  was  54.5 
and  the  S.D.  was  23.34. 

The  "accuracy  criterion"  was  obtained  by  dividing  the  differences 
between  gross  words  and  error  count  by  the  gross  words,  i.e. 

gross  words  —  error  count 
gross  words 

It  may  be  conceived  roughly  as  the  ratio  of  correct  words  to  attempted 
words,  since  almost  all  errors  were  intra-word  errors.  The  reliability 
coefficient  obtained  upon  the  pooled  alternate  days  was  .74dz.02, 
and  the  Spearman-Brown  reliability  for  the  total  48  minutes  was  .85. 
The  mean  accuracy  ratio  was  .9750  and  the  S.D.  was  .0111.  The 
frequency  distribution  was  markedly  skewed  (see  Table  IV).  In 
terms  of  strokes,  on  the  basis  of  4.73  strokes  per  word,  the  mean 
accuracy  ratio  would  be  not  far  from  .99485. 

*  This  relatively  mild  "penalty"  is  to  be  considered  only  as  the  most  defensible 
"weighting"  of  this  type  of  error  for  the  purposes  of  measurement  and  prediction, 
and  may  be  applicable  only  in  case  the  subjects  believe  that  their  papers  will  be 
scored  according  to  the  conventional  rigorous  standard.  For  purposes  of  instruc- 
tional discipline  or  office  morale,  the  idea  of  "ten  words  oflf  for  each  error"  is 
probably  not  too  severe. 
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The  intercorrelation  of  the  speed  and  accuracy  criteria  when  these 
are  obtained  upon  the  same  group  of  typed  papers  is  of  the  form 
wherein  some  spurious  index  correlation  (Pearson,  1897)  of  negative 
sign  may  be  induced.  Accordingly  it  was  thought  desirable  to 
ascertain  whether  this  effect  was  observable  in  the  present  case. 
In  the  treatment  of  this  problem  the  writer  was  greatly  aided  by 
Professor  T.  L.  Kelley  who  has  very  kindly  permitted  his  original 
contribution,  "The  Correlation  between  Speed  and  Accuracy,"  to 
appear  in  this  monograph. 

The  mechanism  by  which  this  spurious  correlation  is  produced 
may  be  explained  as  follows:  If  the  number  of  "right"  words  =  R 
and  the  number  of  "wrong"  words  =  W,  the  correlation  of  the 
speed  measures  and  accuracy  ratios  based  upon  them  may  be 
symbolized  by 

r(R+w)  (^) 

Now  in  the  taking  of  a  test  certain  extreme  chance  variations  and 
accidental  or  "observational"  errors  are  likely  to  occur  whereby 
constant  irrelevant  elements  are  added  to  the  paired  underlying 
measures  so  that  a  pair  of  obtained  measures  may  actually  become 

R  +  kR 


(R  +  W  +  kR  +  kw)  and 


(R  +  W  +  kR  +  kw) 


While  in  any  series  of  measures  such  chance  variations  or  errors  are 
likely  to  occur  in  both  R  and  W  elements  whereby  irrelevant  effects 
tending  in  opposite  directions  are  produced,  the  prevailing  tendency 
would  be  toward  a  spurious  correlation  of  negative  sign. 

This  danger  may  be  avoided  by  taking  the  speed  measures  from 
one  set  of  papers  and  the  accuracy  ratios  from  another  set.  Upon 
the  present  data  the  correlation  of  speed  measures  from  the  first  plus 
third  days'  testing  #  the  accuracy  ratios  from  the  second  plus  fourth 
days'  testing  was  found  to  be  .18 ±.04,  and  for  the  speed  measures 
from  the  second  plus  fourth  days'  testing  #  the  accuracy  ratios  from 
the  first  plus  third  days'  testing  the  correlation  was  .22db.04.  The 
theoretical  maximum  correlation  between  these  measures  was  .24, 
N  =  304.  Professor  Kelley  has  derived  formulae  whereby  this 
theoretical  maximum  correlation  may  be  obtained  from  the  original 
scores  when  these  are  given  in  terms  of  correct  and  incorrect  items 
(see  his  formulae  10  and  12,  this  monograph). 

In  order  to  ascertain  whether  a  spurious  correlation  would  be 
operative  to  a  serious  degree,  the  coefficients  were  calculated  upon 
the  measures  obtained  from  the  same  groups  of  papers.    Upon  the 
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amalgamated  first  plus  third  days'  testing  the  correlation  of  speed 
#  accuracy  was  .21  ±.04,  and  upon  the  amalgamated  second  plus 
fourth  days'  testing  this  correlation  was  .25±.04.  Upon  the  total 
four  days'  testing  the  correlation  of  speed  measures  #  accuracy  ratios 
including  whatever  spurious  effects  may  have  existed  was  found  to 
be  .22 ±.04,  and  the  theoretical  maximum  correlation  was  .24, 
N  =  304. 

The  fact  that  the  spurious  effects  in  these  measures  apparently 
produced  no  significant  differences  in  the  resultant  coefficients 
justifies  the  acceptance  of  the  coefficient  of  .22 ±.04  obtained  upon 
the  total  48  minutes  of  testing  as  the  value  of  this  product  moment 
correlation  for  subsequent  calculations. 

The  skewed  distribution  of  the  accuracy  ratios  suggested  the  possi- 
bility of  curviHnearity  of  regression  in  its  correlation  with  the  speed 
measures.  Accordingly  these  measures  were  arranged  in  a  15X  18-fold 
scatter-diagram  (see  Table  IV)  in  order  that  correlation  ratios  (Pear- 
son, 1905)  could  be  computed  and  a  test  for  rectiHnearity  made  by 
means  of  Blakeman's  (1905)  criterion.  The  product  moment  corre- 
lation upon  this  15X  18-fold  table  was  .2176.  The  correlation  ratios 
calculated  by  means  of  the  "working  formula," 


,..„-^/    N.|<^}-(.X). 


NSX2  -  (SX)2 


wherein  Xa  =  a  score  in  a  single  Y-array,  Na  =  the  frequency  of  this 
Y-array,  X  =  any  score  in  the  whole  distribution,  and  N  =  the  fre- 
quency of  the  whole  distribution,  were 

7]  of  speed  on  accuracy  =  .3038 
1?  of  accuracy  on  speed  =  .2667 

The  values  of  Blakeman's  criterion  (see  Brown  and  Thomson,  1921, 
p.  113) 


.67449     2 

for  the  two  jj's  were  2.74  and  1.99,  respectively.  Since  only  one  of 
the  obtained  values  for  the  correlation  ratio  is  inside  the  conven- 
tional limits  of  ±2.5  P.E.  there  is  a  possibihty  that  the  correlation 
of  our  measures  of  speed  #  accuracy  is  not  strictly  rectihnear. 

The  intercorrelations  of  the  speed  and  accuracy  criteria  #  the  total 
typing  grades  criterion  afford  a  measure  of  the  relative  importance 
of  these  two  essential  factors  in  the  estimation  of  the  classroom 
teachers  based  upon  the  entire  training  course  of  three  years.    The 
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total  typing  grades,  however,  are  far  from  a  satisfactory  "ultimate 
criterion"  because  of  their  low  reliabihty.  The  correlation  of  the 
grades  earned  during  the  three  first  semesters  #  those  earned  during 
the  three  second  semesters  was  only  .55 ±.03,  with  a  Spearman- 
Brown  rehabiUty  of  only  .71,  N  =  304,  for  the  total  three  years. 

The  intercorrelation  of  the  speed  measures  #  T.  T.  G.  =  .31  ±.04, 
with  a  theoretical  maximum  correlation  of  .38,  N  =  .304.  For  the 
intercorrelation  of  the  accuracy  ratios  #  T.  T.  G.  the  obtained  coef- 
ficient was  .45 ±.03,  and  the  theoretical  maximum  correlation  was 
.58,  N  =  304.  The  multiple  correlation  (Yule,  1922,  pp.  233-248) 
of  T.  T.  G.  #  the  best-weighted  composite  of  speed  and  accuracy 
upon  the  obtained  coefficients  =  .50,  N  =  304.  The  regression  equa- 
tions were  calculated  in  terms  of  the  average  teachers'  mark  for  the 
three-year  course  (Xi),  the  total  number  of  attempted  words  for  the 
48  minutes  of  typing  (X2),  and  the  accuracy  ratio  (X3).  In  terms  of 
deviations  from  the  means  in  a  form  independent  of  the  differences 
in  variabiUties  (see  Kelley,  1923,  p.  283) 

xi  =  .2276^  X2+  .3994-'  X3. 

The  superiority  of  the  correlation  and  regression  coefficients  for 
the  accuracy  scores  indicates  that  teachers  weight  accuracy  more 
highly  than  speed  in  assigning  terminal  grades.  The  significance  of 
the  relatively  low  multiple-R  of  .50  will  be  discussed  in  the  section 
concerning  the  criterion  based  upon  the  total  typing  grades. 

The  criterion  correlations  for  the  test  variables  are  presented  in 

Table  V. 

The  most  conspicuous  fact  is  that  the  general  tendency  is  a  zero 
correlation  between  the  accuracy  ratio  #  psychological  test  scores, 
the  signs  of  the  r's  for  exactly  half  of  the  Alpha  and  I.  E.  R.  elements 
being  negative.  The  values  of  the  theoretical  maximum  correlations 
in  this  table  probably  have  no  significance.  The  only  coefficients 
possessing  "statistical  significance,"  i.e.,  which  are  larger  than  three 
times  their  probable  error,  are  those  for  Alpha  8,  I.  E.  R.  2,  first 
year  EngUsh  grades,  and  chronological  age. 

A  comparison  of  these  r's  with  those  upon  the  speed  criterion  sug- 
gests the  conclusion  of  some  interest  in  predictive  testing  that  while 
accuracy  is  adjudged  the  more  important  element  of  typing  pro- 
ficiency from  the  standpoint  of  teachers'  marks,  and  also,  as  it  wiU 
be  shown  later,  from  that  of  the  "official"  International  Typewritmg 
Contest  Rules,  it  is  not  with  the  accuracy  factor  but  with  speed  that 
psychological  tests  yield  their  most  promising  correlation  coefficients. 
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TABLE  V 

Criterion  Correlations:  Accuracy 


•     First 

Second 

Both 

Theoretical 

N 

apphcation 

application 

applications 

maximum 
correlation 

Alpha  total 

123 

-.04±.06 

-.02±.06 

-.03±.06 

-.04 

Alpha  1 

123 

.13±.06 

-.01±.06 

.08±.06 

.10 

Alpha  2 

123 

.10±.06 

.08±.06 

.10±.06 

.12 

Alphas 

123 

-.16±.06 

-.02±.06 

-.09±.06 

-.11 

Alpha  4 

123 

.04±.06 

.08zt.06 

.07±.06 

.08 

Alpha  5 

123 

-.02±.06 

-.06±.06 

-.04±.06 

-.05 

Alpha  6 

123 

.10±.06 

.13±.06 

.12±.06 

.13 

Alpha  7 

123 

.02±.06 

-.04±.06 

-.01±.06 

-.01 

Alpha  8 

123 
123 

-.18±.06 
.05±.06 

-.20±.06 
.05±.06 

-.22±.06 
.05zfc.06 

-.27 

I.E.  R.  Total 

.06 

I.  E.  R.  1 

123 

-.01±.06 

.01±.06 

-.OOzh.06 

-.00 

I.  E.  R.  2 

123 

.21±.06 

.23±.06 

.23±.06 

.27 

I.  E.  R.  3 

123 

.07±.06 

.12±.06 

.10±.06 

.11 

I.  E.  R.  4 

123 

-.14±.06 

-.03±.06 

-.10±.06 

-.12 

I.  E.  R.  5 

123 

-.03±.06 

-.10±.06 

-.07±.06 

-.08 

I.  E.  R.  6 

123 

.13±.06 

-.05±.06 

.06±.06 

.07 

I.  E.  R.  7 

123 

-.00±.06 

-.05±.06 

-.03±.06 

-.04 

I.  E.  R.  8 

123 

-.07±.06 

-.17±.06 

-.13±.06 

-.15 

I.  E.  R.  9 

123 

.07±.06 

.02±.06 

.05±.06 

.06 

I.  E.  R.  10 

123 

-.08rb.06 

.09±.06 

.01±.06 

.01 

W-Wn.  g.  c 

77 

-.08±.07 

.06±.07 

-.02±.07 

-.02 

(ditto) 

137 

.17±.08 

.19±.06 

.19±.06 

.21 

W-W  d.  c 

77 
137 

-.00=b.07 
.09±.06 

-.03±.07 

.17rt.06 

-.01±.07 
.14zh.06 

-.02 

(ditto) 

.18 

W-Wsubst 

77 

-.03±.07 

.06±.07 

.01±.07 

.02 

(ditto) 

,     137 

.15±.06 

.10±.06 

.15d=.06 

.21 

Otis 

,     123 

-.02±.06 

(ditto) 

,     153 

.08±.05 



F.  Y.  E.  G 

,     304 

.15±.04 

.19 

Chron.  Age 

,     304 

-.16±.04 

-.17 

Student  activities.  , 

,     240 

.09±.04 

.11 

Upon  our  accuracy  criterion  a  representative  sampling  composed  of 
twenty-four  pencil-and-paper  tests  yielded  a  battery  of  r's  which  are 
scarcely  superior  to  those  which  would  be  obtained  upon  measures 
in  which  the  "true  correlation"  is  zero,  such  as  would  result  from 
dice  throws  or  card  draws.  Furthermore,  the  tests  of  the  I.  E.  R. 
and  W-W  series,  which  yielded  substantially  larger  r's  upon  the  speed 
criterion,  showed  no  superiority  in  predicting  accuracy. 

In  order  to  ascertain  whether  these  correlation  coefficients  are 
influenced  by  the  rehabihty  and  "rate  of  required  reactions"  (see 
p.  22)  of  the  tests  themselves,  rho-correlations  were  obtained  upon 
N  =  21  tests.  Since  almost  half  of  the  criterion  correlation  coeffi- 
cients were  negative,  they  were  considered  in  order  of  "positiveness," 
e.g.,  -.22  was  ranked  lowest  and  .23  was  ranked  highest.  For 
Spearman-Brown  rehabihty  coefficients  #  criterion  correlations  for 
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the  composite  of  both  givings  of  the  test,  rho  =  .44,  N  =  21,  and  if 
the  unsatisfactory  W-W  form  substitution  reUabihty  coefficient  is 
omitted,  rho  =  .55,  N  =  20.  For  "rate  of  reactions"  #  criterion 
correlations,  rho  =  .22,  N  =  21. 

In  order  to  determine  whether  the  same  tests  which  showed  supe- 
rior correlations  with  speed  also  tend  to  be  better  predictors  of  accu- 
racy, the  correlation  between  the  criterion  correlation  coefficients  for 
speed  #  those  for  accuracy  were  obtained.    This  rho  =  .32,  N  =  21. 

These  rho's  suggest  that  in  general  higher  correlations  with  both 
speed  and  accuracy  are  Hkely  to  result  from  an  improvement  in  the 
reHabiUty  and  from  an  increase  in  the  rate  of  reactions  in  the  psycho- 
logical tests  themselves,  and  also  that  the  tests  which  are  better  pre- 
dictors of  speed  are  hkely  to  be  better  predictors  of  accuracy. 


CHAPTER  VI 

The  International  Typewriting  Contest  Rules  Criterion 

As  a  means  of  combining  speed  and  accuracy  into  a  single  objective 
measure,  the  entire  group  of  3648  typed  papers  were  re-scored  in 
conformity  with  the  "official"  International  Typewriting  Contest 
Rules  (see  Kimball,  1921).  The  Rules  are  based  upon  arbitrary 
considerations  rather  than  upon  scientific  determinations,  but 
since  the  data  of  this  study  do  not  provide  a  final  basis  for  deciding 
upon  the  proper  weightings  of  gross  words,  errors,  and  omitted  or 
repeated  words,  it  was  decided  to  abide  by  the  conventional  practice 
rather  than  to  resort  to  some  new  arbitrary  system. 

The  significant  features  of  the  I.  T.  C.  weightings  are  that  for 
each  error  ten  words  are  deducted  from  the  gross  words,  each  omitted 
and  repeated  word  counting  as  an  error,  and  that  gross  words  are 
counted  from  the  printed  copy  instead  of  from  the  subject's  actual 
typed  product.  Accordingly,  in  comparison  with  actual  gross  words, 
errors  receive  a  weighting  or  penalty  of  —  10,  and  the  resultant 
penalties  for  omitted  and  repeated  words  are  —  9  and  —  11, 
respectively. 

Upon  the  I.  T.  C.  scoring,  the  number  of  "net  words"  for  the  total 
48  minutes  of  testing  was  1617.1,  and  the  S.D.  was  356.9.  This 
means  an  average  rate  of  33.7  net  words  per  minute.  For  the 
frequency  distribution  see  Table  VIII. 

Since  the  problem  of  the  best  weighting  of  these  factors  could 
not  be  studied  objectively  from  the  standpoint  of  "validity"  because 
of  the  lack  of  a  satisfactory  "outside  criterion,"  an  investigation 
was  made  to  ascertain  whether  their  effect  upon  reliability  would 
suggest  any  desirable  change  in  the  I.  T.  C.  Rules. 

In  order  to  attack  this  problem  most  advantageously  it  was 
necessary  to  calculate  all  the  intercorrelations  of  these  factors  upon 
equivalent  "halves"  of  the  sets  of  typed  papers.  In  Table  VI, 
a  =  number  of  actual  gross  words,  e  =  "error  count"  (see  p.  35), 
o  =  number  of  omitted  words,  and  r  =  number  of  repeated  (and 
added)  words.  Subscript  1  denotes  these  measures  upon  the  papers 
of  the  first  plus  third  days'  testing,  and  subscript  2  denotes  these 
measures  upon  the  papers  of  the  second  plus  fourth  days'  testing. 
The  fact  that  the  "error  count"  includes  a  count  of  one  error  for 
each  act  of  omitting  or  repeating  of  words  probably  has  no  significant 
influence  upon  its  correlations. 
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The  means  and  standard  deviations  for  the  halves  and  for  the 
totals  were  as  follows : 

Means  S.D.'s  Means  S.D.'s 

:;    mil  \fl.l  }  -(-^>     2230.1  237.8 

ll        ll.tl  ll'lsj  ^(^-^>        54.50  23.34 

oi  2.34  6.27\  ^,     ,  (.  oo  Q  7Q 

02  4.04  7.17/  °^'+'^  ^-^^  ^-^^ 

ri  1.06  3.33\  ^.      .  „    .„  n  n-y 

T2  1.37  3.59/  ^^'+^>  2.43  5.03 

Upon  the  total  48  minutes  of  typing  the  intercorrelations  were 
as  follows: 

gross  words  #  error  count r  =      . 00± . 04 

gross  words  #  omitted  words r  =  — .  04± .  04 

gross  words  #  repeated  words r  =  — .  03  ± .  04 

error  count  #  omitted  words r  =      .  12± .  04 

error  count  #  repeated  and  added  words. .   r  =      .  16± .  04 
omitted  words  #  repeated  and  added  words  r  =     .20  ±.04 
gross  words  #  the  sum  of  error  count, 
omitted  and  repeated  words r  =  — .  02± .  04 

The  correlation  of  the  sum  of  error  count,  omitted  and  repeated 
words  in  the  first  plus  third  days'  testing  #  those  in  the  second  plus 
fourth  days'  testing  was  .54d=.03.  The  correlations  of  the  actual 
gross  words  #  the  correct  words  for  the  first  plus  third  days'  testing 
and  for  the  second  plus  fourth  days'  testing,  i.e. 

r(a^)(a^-e^)  and  ^{s)i^2-^2) 

were  .996  and  .993,  respectively,  N's  =  304,  and  upon  the  total 
four  days'  testing  this  correlation,  i.e. 

was  .995,  N  =  304. 

By  substituting  the  r's  and  S.  D.'s  which  have  been  cited  above 
in  the  Spearman  sums-and-differences  formula  (Spearman,  1913, 
p.  419,  or  Kelley  No.  147)  one  may  ascertain  the  effect  of  different 
weightings  upon  the  reliability  of  the  composite  without  re-scoring 
the  original  papers,  since  tliis  procedure  requires  no  special  assump- 
tions (other  than  arithmetical  accuracy!)  If  the  weighting  of  the 
error  count  (e)  alone  is  to  be  considered,  the  expansion  of  the  formula 
becomes 


W\\\  +  ^Ws\s  +  ^S'^^S^i  +  "^"'"s^'sW. 
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The  factor  w  is  the  weighting  to  be  tried  out,  the  rule-of-thumb 
being  to  multiply  in  the  equation  every  standard  deviation  of  the 
variable  in  question  by  its  weighting. 

If  it  is  desired  to  try  out  the  effect  of  a  differential  weighting 
system  upon  all  three  types  of  error-variables,  the  expanded  equation 
for 

is  equally  straightforward  of  operation  though  lengthy,  involving 
16  algebraic  terms  in  the  numerator  and  20  in  the  denominator. 

An  empirical  weighting  of  the  error-variables  by  means  of  these 
formulae  quickly  showed  that  any  weighting  whatsoever  other  than 
zero  had  the  effect  of  decreasing  the  reliabihty. 

A  criterion  based  upon  gross  words,  then,  is  the  most  rehable  of 
the  objective  measures  to  be  obtained  from  typed  papers.  A  measure 
based  upon  correct  words  only,  i.e.  in  which  the  weighting  of  e  is  —  1, 
is  almost  equally  rehable,  the  obtained  coefficient  upon  the  measures 
of  24  minutes  dropping  from  .9083  to  .9067  when  error  count  alone 
is  considered.  Such  a  scoring  is  also  practically  equivalent  for 
measurement  purposes  since  the  correlation  of  gross  words  #  gross 
words  minus  error  count  upon  the  total  48  minutes  of  testing,  i.e. 

was  .995,  N  =  304. 

But  with  larger  weightings  the  decreases  in  rehabihty  become 
palpable.  When  error  count  alone  is  considered,  a  weighting  of  ±  5 
brings  the  rehabihty  down  to  .878;  when  w  =  ±10,  the  rehabihty 
becomes  .835;  when  w  =  ±50,  r  becomes  .771;  when  w  =  ±100, 
r  becomes  .767  or  .768;  and  for  the  absurd  weightings  of  ±500, 
±1000  and  ±5000,  r  remains  stationary  at  .7662  +. 

When  all  three  types  of  error-variables  are  weighted  uniformly,  i.e. 

the  decreases  in  rehabihty  with  increases  in  weighting  are  more 
rapid.  When  w  =  —  1,  r  drops  from  .9083  to  .9020;  when  w  =  —5, 
r  becomes  .80;  when  w  =  — 10,  r  becomes  .69;  when  w  =  —50,  r 
becomes  .55;  when  w  =  ±100,  r  becomes  .54,  and  for  the  absurd 
weightings  of  ±500,  ±1000,  and  ±5000,  r  remains  stationary 
at  .539+. 

What  is  the  relation  of  these  findings  to  the  rehabihty  of  the  I.  T.  C. 
scoring?  The  obtained  rehabihty  for  the  actual  I.  T.  C.  scoring 
of  the  papers  was  .69 ±.02  between  halves  of  the  series,  with  a 
Spearman-Brown  rehabihty  of  .82  for  the  total  48  minutes.    If  gross 
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words  and  error  count  were  treated  as  in  this  study,  i.e.  gross  words 
being  counted  from  the  actual  typed  production  and  any  incident 
of  omission  and  repetition  being  treated  as  one  error  regardless  of 
the  number  of  words  involved,  the  reliabihty  of  the  scores  with 
errors  penaHzed  —  10  upon  halves  would  be  .83=b.01  with  a  Spearman- 
Brown  reliability  of  .91  for  the  total  48  minutes. 

The  effect  of  penalizing  each  omitted  or  repeated  word  as  a 
separate  error,  then,  is  to  reduce  markedly  the  reliabihty  of  the 
resulting  scores.  This  loss  in  rehabihty  however  may  not  be  irre- 
placeable, since  upon  the  assumptions  underlying  the  Spearman- 
Brown  formula  (Kelley  Nos.  158  or  159)  the  rehabihty  of  a  test  is  a 
function  of  the  amount  of  testing,  which  may  be  measured  in  this 
case  by  the  amount  of  time  devoted  to  testing.  In  order  to  bring 
the  reliability  of  the  I.  T.  C.  scoring  up  to  that  which  was  obtained 
upon  the  basis  of  the  "error  count"  of  this  study  in  48  minutes  of 
testing,  the  theoretical  expectation  for  the  I.  T.  C.  scoring  is  a  time 
requirement  of  about  105  minutes.  In  order  to  bring  the  reliability 
of  both  scorings  up  to  .95,  the  theoretical  expectations  would  be 
about  90  minutes  for  the  simple  error  count  scoring  and  about  200 
minutes  for  the  I.  T.  C.  scoring.  Thus  the  I.  T.  C.  Rules'  treatment 
of  errors  may  not  be  serious  from  the  standpoint  of  reliability  for 
daily  classroom  records  since  the  time  given  to  actual  speed  testing 
may  be  more  than  adequate,  but  in  the  case  of  special  testing  for 
experimental  purposes  or  for  contests  in  which  the  time  hmits  range 
from  15  to  60  minutes,  the  reliabihty  factor  is  important  enough 
to  justify  some  revision  of  the  conventional  scoring  rules. 

A  study  of  the  correlation  and  multiple  regression  coefficients 
indicates  that  in  the  I.  T.  C.  scoring,  as  well  as  in  the  teachers' 
estimation  as  measured  by  total  typing  grades,  the  accuracy  factor 
is  deemed  more  important  than  the  speed  factor,  or  more  specifically, 
the  penalizing  of  errors  by  —  10  has  the  effect  of  weighting  the 
resultant  score  more  in  the  direction  of  accuracy  than  of  speed.  For 
I.  T.  C.  #  gross  words,  r  =  .63  ±  .02,  with  a  theoretical  maxi- 
mum correlation  of  .71,  N  =  304,  and  for  I.  T.  C.  #  accuracy  ratio 
these  coefficients  were  .73  ±  .02  and  .88,  respectively. 

Upon  the  obtained  coefficients  the  multiple-R  of  I.  T.  C.  scores  # 
the  best-weighted  composite  of  actual  gross  words  and  accuracy  ratio 
=  .87,  N  =  304.  The  regression  equations,  wherein  Xi  =  the  total 
number  of  "net"  words  for  the  total  of  48  minutes  upon  the  I.  T.  C. 
scoring,  X2  =  the  actual  gross  words  in  48  minutes,  and  X3  =  the 
accuracy  ratio,  are  as  follows:  In  terms  of  deviations  from  the  mean, 

xi  =  .7392  X2  +  20,016.9715  xa 
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and  in  a  form  independent  of  the  different  variabilities  of  the 
measures, 

xi=  .4925-X2+  .6237 -X3 

and  in  terms  of  gross  scores, 

Xi  =  .7392  X2  +  20,016.9715  X3  -  19,547.0561 

with  a  standard  error  of  estimated  scores  of  277.2. 

The  obtained  multiple-R  of  .87  is  low  in  view  of  the  fact  that 
all  of  these  measures  together  are  supposed  to  measure  the  same 
factors  upon  the  very  same  data.  A  satisfactory  explanation  of  this 
discrepancy  can  be  made  on  the  basis  of  the  reduction  of  reliability 
arising  from  the  treatment  of  omitted  words  in  the  I.  T.  C.  scoring, 
and  probably  also  by  reason  of  the  effect  upon  the  meaning  and 
reliability  of  the  error  scores  due  to  their  conversion  into  ratios. 
For  a  "theoretical  maximum  multiple  correlation"  calculated  upon 
the  theoretical  maximum  zero-order  correlations,  the  purpose  of  which 
is  to  estimate  the  correlation  which  would  be  obtained  if  the  measures 
were  perfectly  reliable,  becomes  .992,  N  =  304,  for  the  correlation 
of  I.  T.  C.  scormg  #  the  best-weighted  composite  of  gross  words  and 
accuracy  ratio.  The  "theoretical  maximum  partial  correlations" 
of  I.  T.  C.  scoring  #  gross  words  for  constant  accuracy  ratio,  and 
similarly  for  I.  T.  C.  scoring  #  accuracy  ratio  for  constant  number  of 
gross  words  are  both  unity  (the  actually  obtained  values  being  1.08 
and  1.04,  respectively).  Although  the  I.  T.  C.  scoring  includes  the 
number  of  omitted  and  repeated  words  as  variables  while  the  com- 
posite of  "speed"  and  "accuracy"  almost  ignores  them,  still  when 
their  correlations  are  "corrected  for  attenuation"  due  to  chance 
factors  (see  Spearman,  1904),  a  term  which  should  be  permissible 
here  in  view  of  the  relatively  high  statistical  reliability  of  the  co- 
efficients themselves,  the  omitted  and  repeated  words  become  inert. 
Apparently  they  add  little  if  anything  beyond  unreliability  to  the 
measures  in  which  they  are  included.  This  seems  to  bring  fresh 
evidence  in  support  of  the  conclusion  that  they  are  probably  little 
more  than  chance  occurrences  (see  pp.  32-34). 

The  criterion  correlations  for  the  test  variables  are  presented  in 
Table  VII.  The  general  trend  is  in  accord  with  what  might  be 
expected  from  the  fact  that  these  tests  showed  rather  low  but  positive 
correlation  with  the  speed  criterion  and  practically  no  correlation 
at  all  with  the  accuracy  criterion.  Their  general  level  is  lower  than 
in  the  case  of  speed,  but  their  sign  is  usually  positive.    The  I.  E.  R. 
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TABLE  VII 

Criterion  Correlations:  International  Typewriting  Contest 
Rules  Scoring 


N 


First 

Second 

Both 

Theoretical 

application 

application 

applications 

maximum 
correlation 

.08±.06 

-.00d=.06 

.04±.06 

.05 

.02±.06 

.14±.06 

.08±.06 

.11 

.13±.06 

.01±.06 

.08±.06 

.10 

-.14±.06 

.07±.06 

-.04±.06 

-.05 

.04±.06 

.12±.06 

.08±.06 

.11 

.01±.06 

.14±.06 

.08±.06 

.10 

.18±.06 

.04±.06 

.12±.06 

.14 

-.02±.06 

.01±.06 

-.01±.06 

-.01 

.02±.06 

-.04±.06 

-.01±.06 

-.02 

.06±.06 

.11±.06 

.09±.06 

.10 

.04±.06 

.15±.06 

.lOdb.06 

.12 

.15±.06 

.02±.06 

.10±.06 

.12 

.22±.06 

.19±.06 

.22±.06 

.26 

-.11±.06 

-.02±.06 

-.07±.06 

-.10 

.04±.06 

.05±.06 

.05±.06 

.06 

.06±.06 

-.01±.06 

.03±.06 

.04 

.00±.06 

.08±.06 

.04 it. 06 

.06 

.05±.06 

-.04±.06 

.01±.06 

.01 

.04±.06 

.10±.06 

.07±.06 

.09 

.04±.06 

.13±.06 

.10±.06 

.12 

.05±.07 

.20±.07 

.20±.07 

.16 

.15±.06 

,19±.06 

.19±.06 

.20 

.17±.07 

.12±.07 

.12±.07 

.20 

.15±.06 

.20±.05 

.20±.05 

.24 

.14±.07 

.10±.07 

.10±.07 

.24 

.20±.05 

.27±.05 

.27±.05 

.36 

.00±.06 

.17±.05 

.19±.04 

.25 

-.22±.04 

-.24 

.17±.04 

.21 

Alpha  total 123 

Alpha  1 123 

Alpha  2 123 

Alpha  3 123 

Alpha  4 123 

Alpha  5 123 

Alpha  6 123 

Alpha  7 123 

Alphas 123 

I.  E.  R.  total 123 

I.  E.  R.  1 123 

I.  E.  R.  2 123 

I.  E.  R.  3 123 

I.  E.  R.  4 123 

I.  E.  R.  5 123 

I.  E.  R.  6 123 

I.  E.  R.  7 123 

I.  E.  R.  8 123 

I.  E.  R.  9 123 

I.  E.  R.  10 123 

W-Wn.  g.  c 77 

(ditto) 137 

W-Wd.  c 77 

(ditto) 137 

W-W  subst 77 

(ditto) 137 

Otis 123 

(ditto) 153 

F.  Y.  E.  G 304 

Chron.  Age. 304 

Student  activities.  .  240 


tests  are  probably  superior  to  those  of  Army  Alpha,  while  the  three 
W-W  tests  yield  the  highest  values  of  all. 

As  in  the  case  of  the  criterion  correlations  for  speed  and  accuracy, 
a  study  was  made  of  the  influence  of  the  reliability  of  the  tests  and 
of  their  "rate  of  required  reactions"  upon  their  correlations  with  the 
I.  T.  C.  criterion.  Upon  N  =  12  test  elements,  the  rank  order 
correlation  (rho)  of  the  reliability  coefficients  #  the  correlations 
against  the  I.  T.  C.  criterion  =  .49,  and  if  the  unsatisfactory  W-W 
form  substitution  reliability  coefficient  were  omitted,  this  rho  =  .67, 
N  =  20.  For  "rate  of  required  reactions"  #  the  I.  T.  C.  criterion 
correlations,  rho  =  .44,  N  =  21.  Here  too,  an  improvement  in  the 
reliability  of  the  test  variables  and  an  increase  in  their  rate  of  re- 
quired reactions  is  likely  to  be  attended  by  an  increase  in  the 
magnitude  of  their  criterion  correlations. 
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The  fact  that  the  tests  which  are  better  predictors  of  speed  and  of 
accuracy  are  also  better  predictors  of  the  I.  T.  C.  criterion  is  shown 
in  the  following  coefficients.  For  criterion  correlations  against 
"speed"  #  those  against  the  I.  T.  C.  scoring,  rho  =  .60,  N  =  21,  and 
for  those  against  "accuracy"  #  those  agamst  the  I.  T.  C.  scoring, 
rho  =  .76,  N  =  21. 


CHAPTER  VII 
The  Total  Typing  Grades  Criterion 

The  teachers'  marks  assigned  in  typing  classes  during  the  entire 
three  year  course  have  been  included  as  a  criterion  measure  in  this 
study  because  they  introduce  a  number  of  factors  not  measured  by 
the  purely  objective  variables  based  upon  typed  papers.  An  im- 
portant difference  is  that  they  cover  the  whole  period  of  training 
instead  of  only  the  final  period  of  class  room  practice. 

The  elements  which  enter  into  the  teachers'  assigning  of  gradings 
are  complex  and  little  known.  The  marks  are  based  ostensibly  upon 
a  pupil's  class  room  attainment,  which  is  itself  the  resultant  of  her 
native  capacity  and  her  successful  utilization  of  that  capacity,  both 
of  which  are  likewise  complex  variables.  Furthermore,  unless 
terminal  gradings  are  based  exclusively  upon  objective  "achievement 
tests,"  a  procedure  which  is  not  yet  possible  in  typing,  there  is  the 
likelihood  that  these  markings  will  be  influenced  by  a  number  of 
elements  in  the  personalities  both  of  the  student  and  the  teacher 
independent  of  the  actual  proficiency  in  question,  e.g.  the  frequently 
noted  tendencies  of  some  teachers  to  give  either  high  or  low  marks 
consistently  (see  Kelly,  1914). 

In  defense  of  teachers'  marks  as  a  criterion,  it  may  be  urged  that 
if  they  do  include  other  factors  not  measured  in  the  students'  actual 
typed  production,  such  factors  also  enter  into  the  future  success 
"on  the  job"  in  which  desirable  social  traits  may  often  more  than 
compensate  for  a  mediocre  proficiency  in  operating  the  machine. 
Finally,  school  marks  are  the  recognized  measure  for  administrative 
purposes  of  the  pupil's  success  in  the  course  of  study. 

The  T.  T.  G.  criterion  is  the  mean  of  the  six  semester  grades  earned 
in  typing  during  the  entire  three  years'  course.  (In  a  few  cases  the 
student  had  reached  this  level  of  training  in  five  semesters,  and  in 
one  case  in  eight  semesters.)  The  marking  system  in  practice  in 
the  high  school  in  which  these  measures  were  obtained  was  in  numer- 
ical terms,  60  being  the  "passing  mark."  The  mean  grade  among 
the  304  students  was  76.7,  and  the  standard  deviation  was  6.2.  For 
their  distribution  see  Table  VIII. 

(The  3648  typed  papers  of  this  experiment  were  not  included  in 
the  regular  teachers'  gradings.) 

A  serious  defect  of  this  criterion  is  its  low  reliability.    For  the  cor- 

50 
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relation  of  the  mean  of  the  three  first  semesters  #  that  of  the  three 
second  semesters,  r  =  .55±.03,  with  a  Spearman-Brown  rehability 
for  the  total  six  semesters  of  .71.  To  improve  this  rehabihty  by  in- 
creasing the  nmnber  of  measures  is  obviously  impossible  since  all 
available  grades  were  already  in  use.  To  improve  the  methods  of 
grading  may  bring  about  some  increased  reliability,  but  even  if  the 
marking  system  could  produce  a  perfectly  valid  measure  of  a  pupil's 
progress,  it  is  doubtful  whether  her  own  accomplishment  during  only 
three  years  is  sufficiently  stable  to  achieve  a  reliability  coeflBcient  in 
the  nineties. 

In  order  to  study  more  closely  the  reliability  of  typing  grades,  cor- 
relation coefficients  were  calculated  upon  the  grades  of  the  first  semes- 
ter #  those  of  the  second  semester  for  each  of  the  three  years.  For 
the  first  year  this  correlation  was  .44 ±.03,  for  the  second  year,  r  = 
.45±.03,  and  for  the  third  year,  r  =  .36db.03.  These  cited  coeflScients 
for  the  first  two  years  are  lower  than  those  which  would  be  obtained 
upon  all  of  the  pupils  in  the  school  during  those  years  since  the  range 
of  the  present  group  had  become  greatly  limited  through  the  effect 
of  a  progressive  elimination  of  the  poorer  students.  The  reliability 
coefficient  of  .36 ±.03  for  the  third  year  appears  very  low,  even  in 
view  of  this  selective  factor.  From  the  standpoint  of  their  use  as  a 
criterion  measure,  the  third  year  grades  alone  would  have  been  quite 
unsatisfactory  because  of  their  low  reliability. 

The  intercorrelations  of  the  T.  T.  G.  criterion  with  the  speed, 
accuracy  and  I.  T.  C.  criteria  are  interesting.  For  T.  T.  G.  #  gross 
words,  r  =  .31  ±.04,  with  a  theoretical  maximum  correlation  of  .38, 
N  =  304;  for  T.  T.  G.  ±  accuracy  ratio  these  coefficients  were  .45 
±.03  and  .58,  respectively;  and  for  T.  T.  G.  #  I.  T.  C.  these  coeffi- 
cients were  .50±.03  and  .66,  respectively.  The  multiple-R  of  T.  T.  G. 
#  the  best-weighted  composite  of  gross  words  and  accuracy  ratio  was 
.50,  N  =  304,  and  the  regression  equations,  wherein  Xi  =  T.  T.  G., 
X2  =  gross  words,  and  Xj  =  accuracy  ratio,  were,  in  terms  of  devi- 
ations from  the  mean, 

xi  =  .0059  X2  +  222.6684  X3 
and  in  a  form  independent  of  the  variabilities  of  the  measures, 
xi  =  .2276  -  X2  +  .3994  ^  X3. 

and  in  terms  of  gross  scores, 

Xi  =  .0059  X2  +  222.6684  X3  -  153.6273 
with  a  standard  error  of  estimated  scores  =  5.37. 
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The  correlation  and  multiple  regression  coefficients  indicate  that 
accuracy  is  more  important  than  speed  in  the  teachers'  estimation. 

A  "theoretical  maximum  multiple  correlation"  between  T.  T.  G. 
#  the  best-weighted  composite  of  gross  words  and  accuracy  ratio 
was  .64,  N  =  304. 

That  the  T.  T.  G.  #  I.  T.  C.  correlations,  obtained  and  theoretical, 
are  equivalent  to  the  corresponding  multiple  correlations,  obtained 
and  theoretical,  for  T.  T.  G.  #  the  best-weighted  composite  of  gross 
words  and  accuracy  ratio  is  surprising  in  view  of  the  fact  that  the 
I.  T.  C.  scoring  represents  an  arbitrarily  weighted  combination  of 
speed  and  accuracy  while  the  multiple  correlation  coefficients  are 
based  upon  a  statistically  determined  best-weighting  of  the  same 
factors.  This  is  probably  due  to  the  fact  that  the  requirements  of  the 
procedures  were  not  satisfied :  the  accuracy  ratios  do  not  form  a  nor- 
mal distribution  for  the  purposes  of  partial  and  multiple  correlation, 
and  the  standard  deviations  and  perhaps  the  unitary  correlation 
coefficients  also  were  not  sufficiently  uniform  in  size  for  the  purpose 
of  the  computation  of  theoretical  maximum  correlations. 

In  order  to  ascertain  whether  regression  was  rectilinear  in  the  cor- 
relation of  I.  T.  C.  #  T.  T.  G.  a  15X  18-fold  scatter-diagram  (Table 
VIII)  was  drawn  up.  The  product  moment  correlation  was  .4970, 
and  the  two  correlation  ratios  were: 

V  of  I.  T.  C.   on  T.  T.  G.  =  .5564 

V  of  T.  T.  G.  on  I.  T.  C.    =  .5203 

The  values  of  Blakcman's  criterion  for  these  correlation  ratios  were 
3.23  and  1.99,  respectively.  Thus  only  one  of  the  obtained  rj's  yields 
a  criterion  value  falling  within  the  conventional  limits  of  ±2.5  P.  E. 
The  fact  that  the  correlation  between  typing  grades  and  the 
objective  measures  are  low,  even  when  the  factor  of  unreliability  is 
accounted  for,  theoretically  at  least,  indicates  that  they  are  com- 
posed of  other  impoi'tant  elements  in  addition  to  the  speed  and 
accuracy  factors  as  they  are  measured  in  this  experiment.  If  the 
maximum  correlation  of  total  typing  grades  #  the  best-weighted 
composite  of  speed  and  accuracy  on  the  assumption  of  perfect 
reliability  in  the  three  measures  is  .64,  N  =  304,  how  important  are 
the  other  unmeasured  factors  in  the  T.  T.  G.  criterion?  This  may 
be  measured  by  its  correlation  with  the  best-weighted  composite 
of  all  the  other  factors  independent  of  speed  and  accuracy.  The 
usual  formula  for  multiple  correlation  involving  a  criterion  and  two 
variables  (Yule,  1922,  p.  248,  No.  15) 

Ri(23)  =  Vl--.(l-r^2)(l-r^,.a) 
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may  in  the  special  case  of  non-correlation  of  the  two  variables  be 
written  

Ri(23)  (when  r23  =  o)  =  VrhT+rhz 

Let  Ri(23)  =  1.00  and  ri2  =  .64.  In  order  that  a  correlation  of 
unity  with  T.  T.  G.  be  attained,  then,  the  composite  of  all  other 
variables  independent  of  the  speed  and  accuracy  measures  must 
correlate  with  T.  T.  G.  to  the  extent  of  about  .77.  This  means  that 
the  constant  elements  entering  into  the  T.  T.  G.  criterion  are  not 
half-measured  by  the  speed  and  accuracy  scores  in  this  experiment. 

The  criterion  correlations  for  the  test  variables  are  presented  in 
Table  IX. 

The  obtained  correlations  are  definitely  positive  but  not  as  high  as 
the  correlations  with  the  speed  criterion.  The  coefficients  for  the 
I.  E.  R.  and  W-W  series  are  superior  to  those  for  the  "intelligence 

TABLE  IX 

Criterion  Correlations:  Total  Typing  Grades 


N 


Theoretical 

First  Second  Both  maximum 

apphcation     appHcation    appUcationa     correlation 


Alpha  total 123 

Alpha  1 123 

Alpha  2 123 

Alphas 123 

Alpha  4 123 

Alphas 123 

Alpha  6 123 

Alpha  7 123 

Alphas 123 

I.  E.  R.  total 123 

I.  E.  R.  1 123 

I.  E.  R.  2 123 

I.  E.  R.  3 123 

I.  E.  R.  4 123 

I.  E.  R.  5 123 

I.  E.  R.  6 123 

I.  E.  R.  7 123 

I.  E.  R.  8 123 

I.  E.  R.  9 123 

I.  E.  R.  10 123 

W-Wn.  g.  0 77 

(ditto) 137 

W-Wd.  c 77 

(ditto) 137 

W-W  subst 77 

(ditto) 137 

Otis 123 

(ditto) 153 

F.  Y.  E.  G 304 

Chron.  Age 304 

Student  activities.  .  240 


.15±.06  .14±.06 

.00±.06  .11±.06 

■.12±.06  -.05±.06 

.03±.06  -.06±.06 

.17dz.06  .12±.06 

.03±.06  .04±.06 

.26±.06  .24±.06 

.12±.06  .18±.06 

-.01±.06  -.07±.06 

.25d=.06  .22±.06 

.13±.06  .22±.06 

.12±.06  .21±.06 

.29±.06  .27±.06 

.12±.06  .01±.06 

.10±.06  .10±.06 

.10±.06  .03±.06 

.12±.06  .03db.06 

.12±.06  .05±.06 

.13±.06  .11±.06 

.11±.06  .20±.06 

.29±.07  .45±.06 

.17±.06  .20±.05 

.31±.07  .22=b.07 

.05±.06  .18±.06 

.26±.07  .30±.07 

.23±.05  .19±.06 

.08±.06         

.20±.05         

.41±.03         

-.21±.04         

.18±.04         


.15±.06 
.05±.06 
-.09±.06 
-.02zb.06 
.16±.06 
.04±.06 
.26±.06 
.16±.06 
-.04±.06 
.25±.06 
.19±.06 
.17±.06 
.29±.06 
.07±.06 
.11±.06 
.08±.06 
.08±.06 
.09±.06 
.14±.06 
.17±.06 
.39±.07 
.19±.06 
.28±.07 
.13±.06 
.34±.07 
.25±.05 


.19 
.08 
-.12 
-.02 
.23 
.05 
.33 
.22 
-.08 
.31 
.24 
,23 
.37 
.11 
.14 
.11 
.11 
.12 
.19 
.23 
.48 
.23 
.35 
.17 
.54 
.37 


.57 

-.25 

.25 
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tests"  of  the  Army  Alpha  series.  The  highest  of  all  is  the  obtained 
correlation  of  .41  ±.03  with  a  theoretical  maximum  correlation  of 
.57,  N  =  304,  with  first  year  Enghsh  grades.  This  correlation  seems 
too  high  to  express  the  correlation  between  mere  actual  proficiency 
in  the  two  subjects,  and  suggests  the  possibihty  that  some  extraneous 
factor  such  as  "amenabihty  to  class  room  disciphne"  enters  as  a 
common  element  in  these  two  measures. 

As  in  the  case  of  the  other  three  criteria,  a  study  was  made  to 
determine  whether  the  reliabiHties  of  the  tests  and  the  "rate  of 
required  responses"  were  related  to  the  magnitude  of  the  criterion 
correlations  upon  total  typing  grades,  and  also  whether  the  tests 
which  were  good  predictors  of  the  objective  criteria  were  also  good 
predictors  of  the  teachers'  marks.  The  following  rank  order  correla- 
tions were  obtained  against  the  T.  T.  G.  criterion  correlation 
coefficients : 

For  reliabilities,  including  W-W  substitution p  =  ,37,  N  =  21 

For  reliabilities,  exclusive  of  W-W  substitution p  =  .53,  N  =  20 

For  rate  of  required  responses p  =  .37,  N  =  21 

For  speed  criterion  correlation  coefficients p  =  .61,N  =  21 

For  accuracy  criterion  correlation  coefficients p  =  .40,  N  =  21 

For  I.  T.  C.  criterion  correlation  coefficients p  =  .61,  N  =  21 

Accordingly,  it  is  apparent  that  the  reliabihty  and  rate  of  responses 
are  important  considerations  in  the  selection  of  tests  to  be  employed 
against  a  criterion  based  upon  teachers'  marks  as  well  as  against  the 
objective  criteria,  and  it  is  significant  that  the  same  psychological 
tests  tend  to  be  good  predictors  of  all  four  criteria. 


CHAPTER  VIII 

The  Contribution  of  this  Study  with  Reference  to  Pre- 
dictive Testing  in  Typing 

The  technical  measure  of  the  predictive  value  of  a  battery  of 
psychological  tests  and  similar  variables  is  the  quahty  and  magnitude 
of  their  correlation  coefficients  calculated  against  valid  criteria  of 
proficiency.  The  problem  of  organizing  a  predictive  investigation 
consists  in  (1)  the  choice  of  an  experimental  group  which  is  represent- 
ative of  the  population  in  question  and  large  enough  to  give  rehabihty 
to  the  obtained  coefiicients,  (2)  the  choice  of  criteria  which  are  vahd 
measures  of  typing  ability  and  also  of  known  rehability,  (3)  the 
choice  of  a  group  of  test  variables  which  correlate  highly  with  the 
accepted  criteria  and,  if  possible,  negligibly  with  the  other  test 
variables  of  the  group,  and  (4)  the  choice  of  technical  procedures 
which  not  only  state  clearly  and  quantitatively  the  correlational 
results  obtained  but  also  show  equally  clearly  the  reliabihty  of  the 
results  in  terms  of  their  "standard  errors"  or  "probable  errors." 
The  contributions,  both  positive  and  negative,  of  this  study  as  they 
appear  to  the  writer  will  be  summarized  in  the  succeeding  pages. 

The  ideal  predictive  experiment  is  one  in  which  the  tests  are  ad- 
ministered before  the  students  have  begun  their  course  of  training, 
at  the  time  when  prediction  is  to  be  made.  This  would  permit  follow- 
ing the  subjects  through  their  entire  course  and  a  much  more  precise 
evaluation  of  the  prophetic  value  of  the  tests  would  be  possible.  A 
far-reaching  selective  process  goes  on  during  a  course  of  training 
whereby  the  remainder  who  have  survived  through  to  the  end  of 
their  course  represent  a  restricted  range  of  ability  in  which  almost  all 
of  the  potential  failures  have  already  been  eliminated.  This  limited 
range  of  our  experimental  group  means,  then,  that  our  obtained  cor- 
relation coefficients  are  lower  than  those  which  would  have  been 
obtained  upon  a  group  which  included  the  usual  number  of  potential 
failures.  On  the  other  hand,  the  fact  that  in  our  experiment  the 
psychological  tests  and  the  typing  measures  were  taken  close  together 
in  point  of  time  means  that  the  obtained  correlations  are  probably 
higher  than  those  which  would  be  obtained  if  the  two  measures  were 
taken  so  far  apart  in  time  that  a  differential  development  of  various 
psychological  functions  could  have  taken  place  during  the  interval. 
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The  resultant  effect  of  these  two  factors  cannot  be  estimated,  but  it 
is  entirely  possible  that  our  investigation  if  repeated  upon  such  an 
ideal  experimental  population  may  show  some  significantly  different 
results. 

A  fundamental  study  of  criterion  measures  should  from  all  con- 
siderations precede  the  study  of  the  behavior  of  test  variables,  but 
one  notes  that  in  most  of  the  reported  contributions  the  chief  efforts 
have  been  expended  upon  the  latter. 

In  the  present  investigation  four  criterion  measures  have  received 
a  parallel  study,  three  of  which  were  based  upon  an  objective  typed 
production.  Objective  measures  are  generally  to  be  preferred  where 
obtainable.  The  fourth  criterion  based  upon  total  typing  grades, 
however,  appears  to  add  many  factors  not  included  in  the  objective 
measures  and  perhaps  too  important  to  be  neglected. 

A  speed  criterion  based  upon  gross  words  is  unquestionably  an 
essential  factor.  It  is  easy  to  obtain,  is  highly  reliable,  and  also 
tends  to  yield  promising  correlations  with  psychological  tests  and 
other  variables.  It  is  probable  from  general  considerations  that 
speed  scores  should  be  based  upon  stroke  count  rather  than  upon 
word  count,  but  with  the  Kimball  material  in  the  present  study  no 
significant  superiority  was  shown.  This  material,  however,  presum- 
ably was  prepared  with  special  care  to  obtain  homogeneity  since  it 
had  been  "official"  copy  for  a  world's  championship  test. 

The  accuracy  factor  does  not  lend  itself  as  readily  to  measurement. 
From  the  standpoint  of  reliability  it  is  satisfactory :  a  period  of  about 
100  minutes  of  speed  testing  upon  our  population  would  theoretically 
yield  a  Spearman-Brown  reliability  in  the  nineties.  It  is  possible 
also  that  its  measure  may  be  improved  by  taking  the  error  count 
upon  a  stroke  basis  instead  of  upon  a  word  basis. 

A  more  serious  problem  is  in  the  quantitative  expression  of  the 
accuracy  factor.  Its  statement  in  terms  of  the  ratio  of  "correct 
words  (or  strokes)"  to  gross  words  (or  strokes)  is  a  readily  com- 
prehensible measure,  but  is  faulty  because  of  the  skewness  of  distri- 
bution, which  also  raises  the  suspicion  of  curvilinearity  of  regression 
in  its  correlation  with  other  variables.  A  newer  method  of  express- 
ing this  relation  in  terms  of  the  algebraic  differences  between  "stand- 
ard measures"  has  many  technical  advantages  over  the  ratio  method, 
and  may  avoid  much  of  this  skewness  of  distribution.  The  latter 
method  was  definitely  considered  for  use  in  the  present  study,  but 
the  ratio  method  was  preferred  because  of  its  greater  ease  of  inter- 
pretation (admittedly  not  a  strong  reason). 

As  judged  from  the  correlation  with  teachers'  marks  and  also  with 
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the  arbitrary  International  Typewriting  Contest  Rules  scoring,  accu- 
racy is  to  be  weighted  more  heavily  than  speed,  and  it  is  interesting 
that  accuracy  is  much  more  difficult  to  predict  by  psychological  test 
methods;  in  fact,  a  representative  group  of  24  different  pencil-and- 
paper  psychological  tests  showed  a  distribution  of  correlations  with 
accuracy  which  was  not  much  superior  to  a  distribution  of  coefficients 
which  would  be  obtained  upon  totally  uncorrected  material. 

The  advisability  of  a  separate  treating  of  speed  and  accuracy  is 
definitely  indicated  in  our  results.  The  fact  that  the  two  factors  inter- 
correlate  only  about  .2  or  ,3  means  that  they  are  almost  entirely  dis- 
tinct elements,  and  that  better  success  may  be  attained  by  seeking 
to  predict  each  element  individually. 

The  arbitrary  International  Typewriting  Contest  Rules  criterion 
does  not  appear  promising.  Its  reliability  is  relatively  low,  which 
means  that  a  given  amount  of  reliability  can  be  attained  only  by  a 
large  expenditure  of  time  in  testing.  Its  weakness  arises  from  its 
treatment  of  omitted  and  repeated  words.  Much  has  been  said  about 
these  factors  in  the  preceding  pages.  While  they  may  appear  to  be 
of  little  importance,  their  effect  upon  the  scoring  is  to  cause  a  need- 
less loss  in  its  reliability  with  no  appreciable  influence  otherwise  upon 
its  qualities  as  a  measurement.  The  evidence  of  our  study  indicates 
that  they  are  little  more  than  chance  occurrences  incapable  of  meas- 
urement in  themselves  because  of  their  almost  entire  lack  of  reli- 
ability. 

It  is  probable  that  the  48  minutes  of  speed  testing  was  insufficient. 
The  conservative  practice  is  that  reliability  coefficients  should  be  at 
least  as  high  as  the  nineties,  and  only  the  speed  criterion  of  our  three 
objective  measures  satisfies  this  desideratum.  It  is  improbable,  how- 
ever, that  our  criterion  correlations  against  accuracy  would  have 
been  much  improved  by  an  increase  in  measure-taking  since  their 
tendency  was  so  close  to  zero. 

Total  typing  grades  deserve  a  place  among  a  series  of  criterion 
measures  because  they  introduce  a  number  of  important  elements 
not  comprehended  in  measures  based  upon  typed  papers.  Though 
not  an  objective  measure  in  the  sense  that  gross  words  and  "accu- 
racy ratio"  are  objective,  still  when  they  comprise  gradings  given  by 
several  teachers  they  assume  a  sort  of  objective  character.  It  has 
been  said  that  ultimately  the  difference  between  objective  and  sub- 
jective is  only  a  statistical  difference. 

But  they  possess  two  important  defects.  First,  they  are  relatively 
unreliable,  and  their  reliability  cannot  be  improved  beyond  a  certain 
degree  by  the  mere  taking  of  more  measures  as  in  the  case  of  measures 
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based  upon  a  typed  production.  Even  when  the  markings  for  six 
semesters  were  employed,  their  imreHability  was  so  great  that  its 
correlation  with  any  test  series  whatsoever  could  not  be  expected 
to  reach  the  magnitude  required  of  a  "prognostic  test."  Secondly, 
it  is  only  when  the  system  of  teachers'  gradings  permits  a  fine  group- 
ing of  measures  as  in  the  case  of  numerical  marks  that  the  measures 
are  themselves  satisfactory  for  correlational  treatment.  When 
markings  are  in  literal  or  descriptive  terms  of  only  four  to  six 
categories  there  is  a  loss  in  precision  due  both  to  the  transmuting 
into  the  quantitative  terms  in  order  that  product  moment  correlation 
m.ay  be  emploj'-ed,  and  to  the  coarseness  of  the  grouping. 

A  criterion  based  upon  typing  grades  is  not  difficult  to  predict  by 
test  methods.  Upon  our  data  a  fairly  substantial  team  correlation 
could  be  built  up  by  multiple  correlation  methods,  but  the  high 
correlations  ultimately  desired  in  predictive  experimentation  would 
not  be  obtainable  unless  some  effective  method  were  devised  to 
improve  the  reliability  of  the  criterion  itself. 

The  fact  that  typing  grades  correlate  much  more  poorly  with 
psychological  tests  of  the  "intelligence"  type  as  in  Army  Alpha  and 
Otis  when  with  the  type  of  test  in  the  I.  E.  R.  and  W-W  series 
indicates  that  intelligence  as  measured  by  group  "intelligence  test" 
scores  is  not  their  chief  predictable  element.  The  fact  that  the  same 
type  of  test  which  shows  higher  correlation  with  the  speed  criterion 
also  shows  higher  correlation  with  teachers'  marks  in  typing  classes 
suggests  that  it  is  actually  the  common  element  of  proficiency  in  typ- 
ing which  is  causing  the  correlation. 

One  important  objective  criterion  variable  unfortunately  could 
not  be  evaluated  in  this  study,  i.e.  a  measure  based  upon  the  ti7ne 
required  to  achieve  a  certain  level  of  skill  as  employed  by  Toops 
(1923,  pp.  133-134),  Muscio  and  Sowton  (1923,  pp.  355-356)  and 
Brewington  (1922).  Within  our  population  practically  all  students 
were  in  their  sixth  semester  with  only  a  few  in  the  fifth  and  seventh 
semesters  of  training.  Therefore  the  range  was  too  restricted  to 
afford  a  valid  measure  of  this  time  factor.  Such  a  criterion  measure 
would  be  easily  obtainable  in  most  private  commercial  colleges  and 
in  other  systems  under  which  a  student  may  advance  individually 
in  accordance  with  his  personal  aptitude  and  industry.  The  time 
element  is  not  only  an  important  measure  of  the  ability  in  question 
but  may  be  prophetic  of  future  proficiency.  Let  us  suppose  that  two 
students  in  training  achieve  equal  scores  at  the  time  of  testing,  but 
that  one  of  them  has  reached  this  level  after  a  shorter  period  of 
practice.    This  student,  then,  is  probably  in  an  earlier  period  of  her 
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own  learning  curve  than  in  the  case  of  the  other  student,  so  that  the 
time  of  testing  represents  an  intersecting  of  two  different  learning 
curves.  Therefore,  if  the  portions  of  their  learning  curves  preceding 
this  point  of  intersection  are  prophetic  of  the  portions  following  this 
point,  the  former  student  may  ordinarily  be  expected  ultimately  to 
attain  the  higher  level  of  skill. 

The  most  plausible  criterion  based  upon  performance  during 
training  would  be  one  constructed  of  several  elementary  criteria  as 
employed  by  Toops  (1923,  pp.  133-134).  Representatives  of  such 
elementary  factors  would  be  "speed,"  "accuracy,"  the  time  required 
to  attain  a  given  level  of  proficiency,  and  total  typing  grades  or  some 
other  form  of  comprehensive  subjective  rating  serving  the  same 
purpose.  Such  a  constructed  criterion  should. aim  to  be  a  compre- 
hensive measure  of  all  aspects  of  proficiency. 

The  combining  of  such  a  series  of  elementary  criteria  into  a  com- 
posite presents  a  series  of  difficulties.  A  mere  crude  summation  of 
their  obtained  gross  measures  without  regard  to  their  differences  in 
variabilities  would  probably  result  in  an  unexpected  and  indefensible 
system  of  differential  weightings  of  the  variables.  The  transmutation 
of  gross  scores  into  "standard  measures"  (Kelley,  1923,  pp.  114-117) 
would  permit  a  control  of  the  weighting  factor.  A  simple  summation 
of  their  standard  measures  would  mean  that  each  variable  is  of  equal 
importance  in  the  composite,  a  supposition  which  is  improbable. 
If  a  valid  "ultimate  criterion"  were  available,  which  should  probably 
be  based  upon  actual  success  "on  the  job,"  it  would  be  a  simple 
matter  to  ascertain  the  most  justifiable  weightings  by  Yulean  multiple 
correlation  and  regression  methods.  The  obtaining  of  such  an 
ultimate  criterion  upon  success  in  employment  presents  a  new 
series  of  difficulties.  First,  the  number  of  typists  in  a  single  firm  who 
are  engaged  upon  homogeneous  work  is  small,  so  that  if  an  objective 
measure  based  upon  production  records  is  desired  it  is  seldom  possible 
to  find  experimental  groups  of  sufficient  size  to  give  reliability  to  the 
correlational  results.  If  subjective  ratings  by  department  heads  or 
supervisors  are  resorted  to  in  an  effort  to  obtain  measures  upon 
larger  populations  one  encounters  all  the  weaknesses  inherent  in 
subjective  rating  schemes.  One-judge  ratings  are  notoriously  lacking 
in  reliability  and  objectivity.  These  conditions  may  be  improved 
by  taking  the  averages  of  repeated  ratings  by  several  judges,  but 
it  is  not  often  that  a  firm  can  provide  an  adequate  number  of  qualified 
judges  to  bring  the  reliability  and  objectivity  of  such  a  measure  up 
to  the  level  necessary  in  an  ultimate  criterion.  These  obstacles  are 
by  no  means  insurmountable,  but  they  do  mean  that  a  large  amount 
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of  specialized  investigation  is  necessary,  much  more  than  can  be 
envisaged  in  the  prevaiUng  type  of  "predictive  experiment." 

In  the  absence  of  such  an  ultimate  standard  the  only  available 
method  of  weighting  the  several  elements  is  in  accordance  with  the 
consensus  of  opinion  among  experts  of  their  relative  importance. 
(For  the  statistical  treatment  see  Toops,  ibid.) 

Even  with  such  a  multiple  criterion  it  is  advantageous  to  compute 
correlations  for  each  criterion  element  separately  because  of  the 
analysis  afforded  of  the  behavior  both  of  different  test  variables  and 
of  different  criterion  elements.  When  these  coefficients  have  been 
obtained  it  is  possible  without  a  disproportionate  amount  of  statis- 
tical labor  to  ascertain  the  correlation  of  a  test  (T)  with  the  weighted 
composite  of  the  criterion  elements  (A,  B,  .  .  .  n  —  1,  n)  by  means  of 
the  useful  Spearman  sums-and-differences  formula  (Kelley  No.  147) 
the  expansion  of  which  in  this  case  becomes 

r(T)(w    A+w     B+ +w^  n)  _ 

WaO-AFTA  +  WbO-BrTB  + +  Wn<rnrTn 


/ 


W^aO-^A  +  W-bO-^B  + +  W^nCr^n  +  2WaWba-A0-BrAB  + 

+  2Wn_iWn(rn_iO-nr(n-l)(n 


The  data  of  this  experiment  make  possible  a  comparison  of  the  rel- 
ative merits  of  two  famiUar  types  of  pencil-and-paper  psychological 
tests,  the  "intelHgence  test"  and  the  "association  test."  The  dis- 
tinction made  here  does  not  attempt  to  be  a  very  fundamental  one: 
a  test  is  classified  as  the  former  if  it  is  of  the  type  commonly  used  as 
a  component  of  a  "group  intelhgence  test"  such  as  Ai-my  Alpha  or 
Otis,  and  as  the  latter  if  it  resembles  the  tests  of  the  Woodworth- 
Wells  (1911)  series  of  "association  tests."  Such  distinction  as  there 
exists  appears  to  be  that  the  "intelhgence  test"  involves  single  items 
devised  to  call  into  play  such  complex  mental  functions  as  reasoning 
or  memory,  so  that  the  score  obtained  indicates  how  difficult  a  task 
the  subject  can  perform  in  addition  to  how  quickly  he  can  make  a 
reaction,  while  the  "association  test"  is  made  up  of  single  items  re- 
quiring only  the  simplest  sort  of  discrimination,  so  that  the  score 
depends  almost  entirely  upon  the  speed  with  which  simple  reactions 
can  be  made.  This  difference  is  usually  shown  also  in  the  structure 
of  a  test:  in  the  former  type  the  items  are  ordinarily  arranged  in  an 
increasing  order  of  difficulty,  while  in  the  latter  the  arrangement  is 
usually  on  a  studied  chance  basis  in  order  that  the  difficulty  of  all 
portions  of  the  test  may  be  equahzed.     Among  the  "intelhgence 
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tests,"  then,  the  writer  includes  Army  Alpha  and  Otis  and  their  ele- 
ments, and  also  tests  4,  5,  6,  7  and  8  of  the  I,  E.  R.  series,  and  among 
the  "association  tests"  are  included  the  three  W-W  tests  and  also 
tests  1,  2,  9  and  10  of  the  I.  E.  R.  series. 

It  will  be  noted  that  in  the  case  of  the  two  criteria  where  tests 
manifest  any  correlation  at  all,  i.  e.,  the  speed  and  total  typing  grades 
criteria,  the  "association  tests"  are  definitely  superior  to  the  "intelli- 
gence tests"  in  predictive  value.  The  preference  for  this  type  of 
test  in  previously  reported  prognostic  experimentation  appears  to  be 
supported  by  our  results.  One  notes  also  that  the  I.  E.  R.  series 
consisting  both  of  "association  tests"  and  of  "intelhgence  tests," 
which  were  assembled  by  their  authors  upon  experimental  evidence 
to  make  up  a  test  series  prognostic  of  clerical  abiHties,  actually  does 
correlate  with  typing  to  a  greater  degree  than  the  series  of  tests  of 
Army  Alpha  and  Otis  which  were  designed  to  measure  "general 
intelhgence."  Our  findings  suggest,  then,  that  the  sort  of  pencil- 
and-paper  psychological  tests  which  may  be  expected  to  show  corre- 
lation with  typing  proficiency  are  something  more  specialized  than 
mere  "intelhgence  tests."  Intelligence  tests  upon  our  data  showed 
little  promise. 

Unfortunately  only  one  representative  of  the  "motor  test"  could 
be  included  in  this  study.  This  was  the  I.  E.  R.  3,  which  consists  of 
the  rate  of  copying  long  numbers  correctly  on  the  back  of  the  sheet. 
It  showed  relatively  good  correlations  with  all  criteria  except  "accu- 
racy." 

The  instrumental  "motor"  or  "psychomotor"  test  of  the  psycho- 
logical laboratory  could  not  be  sampled  in  this  study  because  of  the 
lack  of  time  for  individual  testing.  It  is  possible  however  that  a 
closer  individual  analysis  of  the  psychophysical  organization  of  those 
who  become  proficient  typists  may  repay  the  expenditure  of  time. 
For  the  pencil-and-paper  tests  upon  a  time  limit  basis  are  really  quite 
similar  since  they  tend  to  intercorrelate  much  more  highly  with  each 
other  than  with  outside  criterion  measures. 

The  three  non-laboratory  variables,  first  year  Enghsh  grades,  chron- 
ological age,  and  "student  activities"  showed  relatively  good  corre- 
lations against  all  four  criteria.  It  is  probable  that  a  battery  of  such 
variables  can  be  made  up  of  various  personal  data  concerning  a  stu- 
dent which  will  yield  criterion  correlations  of  about  the  same  magni- 
tude as  those  obtained  from  the  conventional  psychological  tests. 

In  the  selection  of  tests  to  be  assayed  the  outstanding  necessity 
indicated  by  our  results  is  for  those  which  will  predict  the  important 
accuracy  factor.    It  seems  unhkely  that  the  conventional  time-limit 
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" pencil-and-paper "  tests  hold  much  promise;  the  experimenter  must 
look  for  new  types  of  variables.* 

Our  results  indicate  that  the  higher  the  reUabiHty  of  a  test,  and  the 
greater  its  rate  of  required  reactions,  the  higher  will  be  its  correla- 
tion against  all  four  criteria.  Also,  tests  which  tend  to  predict  one 
criterion  tend  to  predict  all  criteria.  Although  our  rho  correlations 
based  upon  only  21  or  20  cases  were  only  moderately  "statistically 
reUable  "  when  considered  individually,  it  is  significant  that  the  fifteen 
coefficients  involving  six  different  variables  were  all  definitely  posi- 
tive, the  obtained  values  ranging  from  .22  to  .76. 

Since  the  criterion  correlation  coeflScients  for  single  tests  or  vari- 
ables are  Hkely  to  be  low,  it  is  necessary  to  employ  multiple  regression 
and  correlation  methods  in  order  to  ascertain  the  maximum  joint 
correlation  of  a  team  of  single  variables.  From  the  technical  stand- 
point such  a  battery  should  consist  of  variables  showing  high 
criterion  correlations  but  low  or,  if  possible,  zero,  intercorrelations. 
A  crude  estimate  of  the  predictive  value  of  a  single  variable  in  these 
respects  may  be  made  in  the  following  manner.  Pearson  (1914)  and 
also  Hull  (1923)  show  that  if  all  the  criterion  correlations  are  equal 
and  all  the  intercorrelations  of  the  other  variables  are  equal,  the 
multiple-R  of  an  infinite  number  of  such  variables  is 


W' 


wherein  r'  denotes  any  single  equal  criterion  correlation  and  r"  any 
single  equal  intercorrelation.  The  predictive  value  of  a  single 
variable,  then,  may  be  estimated  roughly  as  the  ratio  of  its  criterion 
correlation  to  the  square  root  of  its  average  intercorrelation  with  the 
other  variables  included  in  the  composite.  Thus,  a  battery  of 
variables  each  correlating  only  .30  with  a  criterion  but  intercor- 
relating  as  low  as  .16  with  the  other  variables  of  the  series  is  theo- 
retically capable  of  yielding  a  higher  multiple-R  than  a  battery  of 
variables  correlating  .40  with  a  criterion  but  intercorrelating  .36 
with  the  other  variables.  For  this  reason  it  is  necessary  to  seek 
variety  in  the  test  variables  in  order  to  avoid  this  overlapping  of 
measurement;  the  weakness  of  the  usual  pencil-and-paper  psycho- 
logical tests  is  their  tendency  to  high  intercorrelation, 

*  Although  tb--  'accuracy  ratio"  must  be  a  measurable  fact  because  of  its 
substantial  reliabihty,  it  is  interesting  that  its  correlations  with  analogous  "cor- 
rectness ratios"  upon  the  two  Army  Alphas,  whose  obtained  reUabihty  was  .82 
±.02,  were  only  .09 ±.06  in  each  case  with  a  theoretical  maximum  correlation  of 
only  .11,  N  =  123.  It  is  still  possible,  however,  that  "accuracy,"  as  distinguished 
from  "correctness,"  if  obtained  upon  suitable  test  material  may  show  correlation 
with  accuracy  in  typing. 
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The  giving  of  all  tests  in  duplicate  when  possible  has  greatly 
increased  the  labor,  but  the  procedure  has  been  entirely  justifiable. 

First,  the  reliabiUty  coefficient  for  each  variable  is  becoming 
recognized  as  a  necessary  datum  in  careful  correlational  experimenta- 
tion since  this  factor  places  a  Hmit  upon  the  possible  size  of  an 
obtained  correlation.  Kelley  (1921,  pp.  372-373)  shows  theoretically 
on  the  assumptions  underlying  the  "correction  for  attenuation" 
that  two  variables  cannot  be  expected  to  yield  a  correlation  greater 
than  the  geometric  mean  of  their  obtained  rehabihty  coefficients. 
While  the  reliabiUty  of  most  standardized  psychological  tests  is 
large  enough  to  preclude  the  hkeUhood  of  a  serious  Hmitation  in  this 
respect,  with  many  of  the  unusual  and  unique  variables  reported 
in  the  Uterature  one  may  question  whether  their  reHabiHty  is 
sufficient  to  support  substantial  correlations.  How  large  a  reliability 
should  an  experimenter  strive  to  secure?  Since  this  factor  is  a  function 
of  the  amount  of  testing  or  measure-taking  the  answer  depends  upon 
the  degree  of  precision  desired.  If  the  purpose  of  a  study  is  merely 
to  assay  a  very  large  number  of  test  variables  for  their  predictive 
value,  it  is  probably  more  economical  ultimately  to  be  content  with 
moderate  reliabiUties  ranging  from  about  .6  to  .8  for  the  test  vari- 
ables. But  if  the  purpose  is  definitely  the  constructing  of  a  predictive 
test  series,  the  higher  the  rehabiHties  the  better.  The  authors  of 
omnibus  "intelhgence  tests"  and  school  "achievement  tests"  are 
reporting  reHability  coefficients  in  the  nineties,  and  a  specialized 
"prognostic  test"  should  not  fall  below  this  standard. 

Secondly,  the  giving  of  two  alternative  forms  of  a  test  permits 
the  calculation  of  "theoretical  maximum  correlations,"  the  purpose 
of  which  is  to  provide  an  estimate  of  what  the  underlying  correlation 
would  be  if  the  variables  were  perfectly  rehable.  Although  such 
coefficients  are  highly  speculative  and  must  be  rigorously  dis- 
tinguished from  "obtained  correlations,"  especially  in  the  case  of 
small  populations,  they  may  under  careful  interpretation  enable 
one  to  appraise  the  ultimate  predictive  value  of  a  test  or  type  of 
test  material. 

Thirdly,  the  giving  of  tests  in  dupUcate  enables  the  experimenter 
to  ascertain  the  stabiHty  of  a  test  under  repeated  appUcations. 
While  the  coefficients  obtained  upon  both  givings  were  usually  com- 
parable, there  were  occasional  divergences  which  must  be  accounted 
for.  It  may  be  that  these  were  merely  sampHng  errors  due  to  small 
populations,  or  it  may  be  that  the  psychological  character  of  a  test 
changes  under  repeated  givings.  The  possibiUty  of  the  unrehabiHty 
factor  may  be  ascertained  by  calculating  the  probability  that  a 
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divergence  as  great  as  the  obtained  one  could  have  arisen  from 
insufficient  sampHng  in  material  in  which  there  was  no  true  difference. 
The  general  formula  for  the  probable  error  of  the  difference  between 
the  correlations  of  a  criterion  variable  I  against  two  other  variables 
A  and  B  is 

■T  •  hi.  /difference  between  Tj^  and  Pj^)    ~ 

.67448975  V -%^  + -%b  "  ^Sa '^'ib  Svib 

For  the  factors  under  the  radical  see  Filon  and  Pearson  (1898) 
Nos.  xvi  and  xxxvii  or  Kelley  Nos.  108b  and  129.  Written  in  terms 
of  the  three  intercorrelations  of  the  three  variables  the  equation 
becomes 


P.E. 


/difference  between  Tj^  and  r^g^ 

(1  -  r^iA)^  +  (1  -  t^ib)'  -  2rAB  (1  -  rhA){l-  rha) 
.67448975    ,    /  ^,        ,  ,         .2      j.  9r    v    r    ^ 

-t-  riAriB(l  -  r  lA  -  r^iB  -  ^ab  +  -iriAriBrAs; 
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In  our  data  the  most  extreme  divergence  is  in  the  correlations  of 
Alpha  3  #  I.  T.  C.  scoring  in  which  the  obtained  coefficients  were 
-.1407  and  .0652,  N  =  123.  The  probable  error  of  this  difference 
is  .0563.  The  absolute  difference  of  .2059,  then,  is  about  3.66  times 
its  probable  error,  a  divergence  which  on  the  assumption  of  normal- 
ity in  the  distributions  both  of  the  measures  and  of  the  r's  calculated 
upon  random  sampUngs  of  the  hypothetical  total  population  of 
which  our  samphng  is  a  part  could  be  expected  to  arise  from  chance 
fluctuations  only  once  in  about  73  trials.  The  difference  between  the 
two  correlations  for  I.  E.  R.  6  #  "accuracy"  (r's  =  .1300  and  -.0505, 
N  =  123)  is  3.41  times  its  probable  error,  a  result  which  should  occur 
from  chance  only  once  in  about  46  trials.  For  the  correlations  of 
W-W  form  substitution  #  "speed"  (r's  =  .3080  and  .1511,  N  =  77) 
the  difference  is  only  1.9  times  its  probable  error,  a  result  which 
should  occur  from  chance  at  least  once  in  about  5  trials.  An  inspec- 
tion of  the  criterion  correlations  of  this  study  as  a  group  shows  that 
extreme  divergences  of  this  kind  are  too  numerous  to  be  accounted 
for  on  the  basis  merely  of  sampling  errors.  When  the  possibility  of 
mistakes  in  tabulation  and  computation  has  been  ruled  out,  the 
meaning  is  that  in  many  cases  the  character  of  the  measure  taking 
is  not  stable  upon  repetition,  whether  because  of  some  inherent 
weaknesses  in  certain  measures,  or  because  of  some  changed  attitudes 
on  the  part  of  the  experimental  subjects. 


The  Correlation  between  Speed  and  Accuracy 

By  Truman  L.  Kelley 
Stanford  University 

From  a  psychological  as  well  as  from  a  pedagogical  point  of  view, 
it  is  valuable  to  know  the  correlation  between  speed  and  accuracy 
when  deahng  with  elementary  and  fundamental  tasks.  It  has  very 
frequently  been  found  that  the  more  rapid  pupils  in  a  computation 
test  were  also  the  more  accurate  when  speed  is  measured  by  the 
number  of  problems  attempted  and  accuracy  by  the  number  correctly 
solved.  If  the  measure  of  accuracy  is  a  percentage,  or  a  proportion, 
measure,  then  the  significance  of  the  correlation  between  speed 
(number  of  problems  attempted)  and  accuracy  (proportion  of 
problems  correctly  solved)  is  uncertain.  The  reason  for  this  becomes 
apparent  when  we  express  these  speed  and  accuracy  measures  in 
terms  of  symbols. 

Let  Xi  equal  the  number  of  exercises  correctly  solved  on  Form  1 
of  some  given  test.  Let  X2  equal  the  number  incorrectly  worked. 
Then  (Xi  +  X2)  =  number  of  attempts  and  Xi/(Xi  +  X2)  =  the 
proportion  correct.  Thus  (Xi  +  X2)  is  the  measure  of  speed  and 
Xi/(Xi  +  X2)  is  the  measure  of  accuracy.  If  we  consider  the  cor- 
relation between  speed  and  accuracy  as  here  defined,  it  is  obvious 
that  when  speed,  (Xi  +  X2),  is  large  as  a  matter  of  chance,  then 
accuracy,  which  contains  the  same  magnitude  in  the  denominator, 
tends  to  be  small  as  a  matter  of  chance.  Accordingly  there  is  a 
chance  negative  correlation  between  speed  and  accuracy  when  the 
speed  and  accuracy  measures  are  products  of  the  same  test.  To 
avoid  this  spurious  correlation  influence  we  require  the  correlation 
between  a  speed  score  derived  from  one  test  (Form  1),  and  an  accuracy 
score  derived  from  a  second  and  independent  test  (Form  2).  In 
order  to  eUminate  the  systematic  effects  of  chance  we  will  need  to 
give  2  forms  of  the  test.  In  addition,  therefore,  to  Xi  and  X2, 
already  defined,  we  have  the  following: 

Xs  =  the  number  of  correct  responses  on  Form  2. 
X4  =  the  number  of  incorrect  responses  on  Form  2. 

The  correlation  ri3  is  the  rehability  coefficient  of  the  single  form  for 
number  of  correct  responses,  and  r24  is  the  reUabihty  coefficient  of 
the  single  form  for  number  of  incorrect  responses.  Let  us  ascertain 
the  meaning  of  ri2  and  of  r34.     Let  Mi  equal  the  mean  of  the  Xi 

66 


CORRELATION  BETWEEN  SPEED  AND  ACCURACY  67 

measures.  Let  Xi  equal  (Xi  —  Mi),  and  let  x  with  other  subscripts 
be  similarly  defined  with  reference  to  other  X  measures.  Let  qis 
equal  (SxiX3)/N  =  cnazTis,  and  similarly  for  q  with  other  subscripts. 
Let  X  00  equal  the  individual's  true  number  of  correct  responses  score 
and  let  x^^  be  his  true  number  of  incorrect  responses  score  and  let  ei 
and  62  be  the  errors  in  Xi  and  X2.    Then 

xi  =  Xco  +  ei,  and  X2  =  x^  +  es 

and  similarly, 

xs  =  X  00  +  es,  and  xi  =  x„  +  e*. 

It  has  ordinarily  been  assumed  that  the  various  e's  were  uncorrected, 
but  in  this  particular  problem  we  may  not  rightfully  make  this 
assumption  because  ei  and  62  occur  at  the  same  sitting  and  if,  as  a 
matter  of  chance  upon  a  particular  test,  an  individual  has  a  large  Ci 
(i.e.  he  gets  more  problems  correct  than  he  would  do  on  the  average), 
then  it  is  very  Ukely  that  his  e2  will  be  small  (i.e.  he  gets  fewer  prob- 
lems incorrect  than  he  would  do  on  the  average).  Accordingly,  ei 
and  Ca  are  to  be  thought  of  as  negatively  correlated.  Also  es  and  e^. 
Otherwise  than  as  here  stated,  errors  may  be  thought  of  as  un- 
correlated. 
The  usual  formula  for  estimated  true  correlation  is : 

r       =        '''^ 


Vr^iVWi 


but  under  the  conditions  here  maintaining  (correlation  between 
certain  of  the  errors)  this  equation  is  incorrect,  as  is  obvious  from 
equation  (4)  below.    We  will  therefore  derive  an  appropriate  formula. 


ffiaz      N<ri(r2 

N(7llT2 

Sx  00  x<^  +  26162      q  CO  cj  +  ^6162 

...       (1) 

qi3       <r2„ 

...     (2) 

r24  =     , 

...     (3 

ri2 

qa>.  +  qe^.^        _            ^      Qe^.^         _                .^e,  «r. 

—                                 ~roo(i)l                   —  ^000)1                    ^ee' 
0"  CO  ^0)                                     <''  00  <''&>                             <''  CO  <^u        ^  * 

...     (4) 

V^  Vth 

We  thus  see  that   ri2  divided    by   \/ri3r24  differs  from    Ta>u   by 
qeie2/(o'ooO"a)).    This  may  be  a  quite  substantial  difference. 
If,  instead  of  correlating  the  number  right  score  on  Form  1  with 
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the  number  wrong  score  on  the  same  form,  we  correlate  the  number 
right  score  on  Form  1  with  the  number  wrong  score  on  Form  2,  or 
the  number  wrong  score  on  Form  1  with  the  number  right  score  on 
Form  2,  we  ehminate  the  correlation  between  errors.  In  place  of 
ri2  we  have  r^  (or  r23). 

qi4       S(xoo  +ei)(x„ +  64)      SXooXt^      qoo<j  /rx 

ri4  = = ^j =  "iTf = ...     (,o; 


Similarly, 


r,,  =  ^^  ...     (6) 

ai(T2 


In  practice  it  would  be  well  to  calculate  both  r^  and  r23  and  take  the 
average  for  the  intercorrelation  between  the  number  right  and  the 
number  wrong.    UtiHzing  (5),  (2),  and  (3),  we  have: 

Tu  Qooco cjs 

00)  •  •  •       V'/ 


This  is  the  required  estimated  true  correlation  between  number 
right  and  number  wrong.  It  is  simply  Spearman's  formula  for 
attenuation  adapted  to  this  situation  where  there  is  correlation 
between  certain  of  the  errors. 

If  we  use  a  geometric  average  of  r^  and  r23  in  place  of  ri4  in  the 
present  problem  we  have: 

Tooo,  =  y  —I-  •••     (7a) 

Having  a  value  for  r^^  not  systematically  affected  by  chance,  we 
may  now  calculate  the  correlation  between  (x^j^  +  x^),  the  true  speed 
score,  and  x^^,  the  true  number  right  score.    Noting  that 

a  00  =  0-1  \/ri3  ...    (8) 

and 


o-(oo+w)  =  Va^oo  +2qoo(j  +  <72a)  ...     (9) 


we  have 

^C  00  +u)  00   — 

<7oo  Vff^'oo  +  ^Qaoo)  +  <^'o>        Vo'^co  +  i2<^  oo  ffw^  co  o)  +  O""  co 

(10) 


g     CO    "^~  Q  00  M _    g  00    "l~  Tu'f  00  Oi 

o  00  Vff^  00  +  2q  00  „  +  <r2^       -y/o'^co  +  Saooffwrcoo)  +  o"" 


Vfia  Vtr^ri3  +  2(rio-2ri4  +  0-22124 

This  is  the  required  formula  for  estimating  the  true  correlation 
between  number  attempted  and  number  right. 
If  we  are  concerned  not  with  the  total  number  of  attempts  and 
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the  total  number  of  rights,  but  with  the  total  number  of  attempts 
and  the  proportion  of  rights,  we  have  for  our  true  scores: 

Xs  =  X  oo  +  X(j  =  true  speed  score 

=  -TTp-  =  true  proportion  right  score 


X  00  +  X(j        X 

The  correlation  between  Xg  and  X^/Xg  is  a  special  case  of  a 
correlation  of  a  variable  with  a  quotient  variable.  It  is  sujfficiently 
different  from  the  usual  quotient  correlation  problem  to  call  for 
separate  derivation.  The  usual  formula  is  (see  Pearson,  Karl, 
Mathematical  Contributions  to  the  Theory  of  Evolution,  Proc.  Roy. 
Soc.  Vol.  60,  1896-7,  p.  489,  also  given  by  Holzinger,  K.  J.,  Formulas 
for  the  correlation  between  ratios,  J.  Ed.  Psych.,  Sept.  1923) 

TacVaVc  +  TbdVbVd  -  TadVaVd  -  rbcVbVc  ..  .  , 


\/(v-a  -  2rabVaVb  +  V^b)  (v^c  -  2rcdVcVd  +  V^a) 

in  which  the  four  variables  are  Xa,  Xb,  Xc,  and  Xa  and  the  two 
ratios  which  are  correlated  are  Xa/Xb  and  Xc/Xd.  If  a  a,  Ch,  (Tc,  o-d 
are  the  four  standard  deviations  and  Ma,  Mb,  Mc,  Ma  the  four  means, 
the  v's  are  defined  by  the  equations 

Va  =  ffa/Ma;    Vb  =  (Tb/Mb)    Vc  =  <tJMc]    Yd  =  <rd/Md 

This  formula  (11)  is  apphcable  only  when  the  v's  are  sufficiently 
small  that  the  fourth  power  terms  are  neghgible  in  comparison  with 
square  terms. 

Utihzing  formula  (11)  for  the  special  case  of  correlation  between 
speed  and  accuracy  we  have 


Xa=Xs 

Xb  =  l 

Xc  =  Xoo 

Xd  =  Xg 

Thus 

Vb  =  0;  Va  =  Vd=ae/(M„  +M«)   =(7(a,+J/(Mi  +  M2) 

(See  formula  9) 

Vo=(Ta3/Mo3=o-oo/Mi        ...        (See  formula  8) 

Further 

Tao  =  rcd=roo8     . .  •     (See  formula  10) 

Tbd  =  0 

r«d  =  1 

rbo  =  0 

rab  =  0 
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Substituting  in  (11)  we  have,  after  certain  simple  reductions 

_  racVaVc  -  V^a  _  roosVc-  Va 

Va  Vv^c  -  2rcdVcVd  +  V-d        Vv^c  -  2VcVdroo8  +  V^d 


...     (12) 


(The  correlation  between  number 
of  exercises  attempted  and  pro- 
portion that  are  right,  corrected 
for  extenuation.) 
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